I have created a script like the one below to do something I called as "weighted" regression:

```
library(plyr)
set.seed(100)
temp.df <- data.frame(uid=1:200,
bp=sample(x=c(100:200),size=200,replace=TRUE),
age=sample(x=c(30:65),size=200,replace=TRUE),
weight=sample(c(1:10),size=200,replace=TRUE),
stringsAsFactors=FALSE)
temp.df.expand <- ddply(temp.df,
c("uid"),
function(df) {
data.frame(bp=rep(df[,"bp"],df[,"weight"]),
age=rep(df[,"age"],df[,"weight"]),
stringsAsFactors=FALSE)})
temp.df.lm <- lm(bp~age,data=temp.df,weights=weight)
temp.df.expand.lm <- lm(bp~age,data=temp.df.expand)
```

You can see that in `temp.df`

, each row has its weight, what I mean is that there is a total of 1178 sample but for rows with same `bp`

and `age`

, they are merge into 1 row and represented in the `weight`

column.

I used the `weight`

parameters in the `lm`

function, then I cross check the result with another dataframe that the `temp.df`

dataframe is "expanded". But I found the `lm`

outputs different for the 2 dataframe.

Did I misinterpret the `weight`

parameters in `lm`

function, and can anyone let me know how to I run regression properly (i.e. without expanding the dataframe manually) for a dataset presented like `temp.df`

? Thanks.

## Best Solution

The problem here is that the degrees of freedom are not being properly added up to get the right Df and mean-sum-squares statistics. This will correct the problem:

Compare with:

I am a bit surprised this has not come up more often on R-help. Either that or my search strategy development powers are weakening with old age.