Data.Table VS Plyr Regression Output

data.table vs plyr regression output

Try this:

> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
        ID (Intercept)           x         z
[1,] Frank  -0.2928611  0.07215896  1.835106
[2,]  Tony   0.9120795 -1.11153056  2.041260
[3,]    Ed   1.0498359  5.77131778 -1.253741

I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.

Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...)) did not succeed in the manner we hoped.

Can we pass objects types to plyr or data.table functions?

I'd pull the Variable (and maybe Trials) out into a data.frame and use aggregate from there:

df <- data.frame(Variable=unlist(a[,1]), Trial=unlist(a[,2]))
df$Edges <- laply(a[,4],ecount)
aggregate(Edges ~ Variable, data=df, mean)

should do what you want—assuming I understand what you want!

(I think you'll need unlist because you've got your matrix of lists)

Use Predict on data.table with Linear Regression

You are predicting onto the entire new data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.

This is an instance where .BY will be useful. Here are two possibilities

a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]

b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]

both of which give identical results

identical(a,b)
# [1] TRUE
a
#   group         V1
#1:     a  -2.525502
#2:     a   3.319445
#3:     a   4.340253
#4:     b -14.588933
#5:     b  11.280766
#6:     b  -1.132324

Regression and summary statistics by group within a data.table

dt[,c(y.med = median(y),
      reg.1 = as.list(coef(lm(y ~ x))),
      reg.2 = as.list(coef(lm(y ~ x + z)))), by=ID]
#      ID     y.med reg.1.(Intercept)   reg.1.x reg.2.(Intercept)      reg.2.x   reg.2.z
#1:    Ed 0.7280448        0.75977555 0.1132509        0.83322290 -0.484348116 0.7655563
#2: Frank 0.6100339       -0.07830664 0.2700846        0.04720686  0.004027939 0.7168521
#3:  Tony 0.2710623       -0.78319379 0.9166601       -0.35836990  0.622822617 0.4161102

Using data.table to create a column of regression coefficients

I think this is what you want:

new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]

lm returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.

Regression models as column in data table, R

If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects

MyVarb <- data.table(Y=rnorm(100),
                     V1=rnorm(100),
                     V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
  reg<-lm(mod, data=MyVarb)
  dt<-data.table(summary(reg)$coefficients)
  dt[,coef:=row.names(summary(reg)$coefficients)]
  dt[,aic:=AIC(reg)]
  dt[,model:=mod]

}))

Linear Regression and storing results in data frame

Here's a vote for the plyr package and ddply().

plyrFunc <- function(x){
  mod <- lm(b~c, data = x)
  return(summary(mod)$coefficients[2,3])
  }

tStats <- ddply(dF, .(a), plyrFunc)
tStats
  a         V1
1 a  1.6124515
2 b -0.1369306
3 c  0.6852483

R: Are there any known issues when plyr/dplyr/data.table and plm packages used together

It seems like in your data (maybe due to the merging process) you have individuals which have the same value in the time index more than once (or more than one NA).
You could either look at your data or try table(index(your_pdataframe), useNA = "ifany") to find out which.

Data.Table VS Plyr Regression Output