Using Data.Table to Create a Column of Regression Coefficients

Using data.table to create a column of regression coefficients

I think this is what you want:

new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]

lm returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.

Creating Data Table of Regression Coefficients

Use names to access the names of the coefficients.

var_names=names(coef(model_B))
coef_vals=coef(model_B)
data.table(Variables=var_names, RegressionCoefficients=coef_vals)

Variables RegressionCoefficients
1: (Intercept) 2.984208e-16
2: radius 1.000000e+00
3: perimeter 1.000000e+00

Using data.table to create a column of regression coefficients

I think this is what you want:

new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]

lm returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.

Regression models as column in data table, R

If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects

MyVarb <- data.table(Y=rnorm(100),
V1=rnorm(100),
V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
reg<-lm(mod, data=MyVarb)
dt<-data.table(summary(reg)$coefficients)
dt[,coef:=row.names(summary(reg)$coefficients)]
dt[,aic:=AIC(reg)]
dt[,model:=mod]

}))

How to run a (linear) regression by group and add coefficients to the original data.table?

Edit

I didn't realize that you want to add to the original table. In this case, use := in j as follows (the original answer, which returns only the summary, is kept at the end).

dt[, 
c('slope', 'intercept') := {
fit <- lm(y ~ x, data = .SD)
list(fit$coefficients[1], fit$coefficients[2])
},
by = country]
dt[]
## x y country slope intercept
## 1: 0.26550866 0.82094629 1 0.6275887 -0.2019328
## 2: 0.37212390 0.64706019 2 0.5042928 0.1252771
## 3: 0.57285336 0.78293276 1 0.6275887 -0.2019328
## 4: 0.90820779 0.55303631 2 0.5042928 0.1252771
## 5: 0.20168193 0.52971958 1 0.6275887 -0.2019328
## 6: 0.89838968 0.78935623 2 0.5042928 0.1252771
## 7: 0.94467527 0.02333120 1 0.6275887 -0.2019328
## 8: 0.66079779 0.47723007 2 0.5042928 0.1252771
## 9: 0.62911404 0.73231374 1 0.6275887 -0.2019328
## 10: 0.06178627 0.69273156 2 0.5042928 0.1252771
## 11: 0.20597457 0.47761962 1 0.6275887 -0.2019328
## 12: 0.17655675 0.86120948 2 0.5042928 0.1252771
## 13: 0.68702285 0.43809711 1 0.6275887 -0.2019328
## 14: 0.38410372 0.24479728 2 0.5042928 0.1252771
## 15: 0.76984142 0.07067905 1 0.6275887 -0.2019328
## 16: 0.49769924 0.09946616 2 0.5042928 0.1252771
## 17: 0.71761851 0.31627171 1 0.6275887 -0.2019328
## 18: 0.99190609 0.51863426 2 0.5042928 0.1252771
## 19: 0.38003518 0.66200508 1 0.6275887 -0.2019328
## 20: 0.77744522 0.40683019 2 0.5042928 0.1252771
## 21: 0.93470523 0.91287592 1 0.6275887 -0.2019328
## 22: 0.21214252 0.29360337 2 0.5042928 0.1252771
## 23: 0.65167377 0.45906573 1 0.6275887 -0.2019328
## 24: 0.12555510 0.33239467 2 0.5042928 0.1252771
## 25: 0.26722067 0.65087047 1 0.6275887 -0.2019328
## 26: 0.38611409 0.25801678 2 0.5042928 0.1252771
## 27: 0.01339033 0.47854525 1 0.6275887 -0.2019328
## 28: 0.38238796 0.76631067 2 0.5042928 0.1252771
## 29: 0.86969085 0.08424691 1 0.6275887 -0.2019328
## 30: 0.34034900 0.87532133 2 0.5042928 0.1252771
## 31: 0.48208012 0.33907294 1 0.6275887 -0.2019328
## 32: 0.59956583 0.83944035 2 0.5042928 0.1252771
## 33: 0.49354131 0.34668349 1 0.6275887 -0.2019328
## 34: 0.18621760 0.33377493 2 0.5042928 0.1252771
## 35: 0.82737332 0.47635125 1 0.6275887 -0.2019328
## 36: 0.66846674 0.89219834 2 0.5042928 0.1252771
## 37: 0.79423986 0.86433947 1 0.6275887 -0.2019328
## 38: 0.10794363 0.38998954 2 0.5042928 0.1252771
## 39: 0.72371095 0.77732070 1 0.6275887 -0.2019328
## 40: 0.41127443 0.96061800 2 0.5042928 0.1252771
## x y country slope intercept

Original answer

This is a perfect avenue for making use of the flexible j expression in data.table. You can put anything in j as long as it returns a list.

dt[, 
{
fit <- lm(y ~ x, data = .SD)
list(intercept = fit$coefficients[1], slope = fit$coefficients[2])
},
by = country]

# country intercept slope
#1: 1 0.6276 -0.2019
#2: 2 0.5043 0.1253

Loop over data table columns and apply glm

You can apply the model and extract the coefficients in the same loop. Using lapply :

output <- do.call(rbind, lapply(names(dt), function(x) {
model <- glm(reformulate(x, 'y'), dt, family=binomial(link="logit"))
summary(model)$coefficients[2]
}))

Regression and summary statistics by group within a data.table

dt[,c(y.med = median(y),
reg.1 = as.list(coef(lm(y ~ x))),
reg.2 = as.list(coef(lm(y ~ x + z)))), by=ID]
# ID y.med reg.1.(Intercept) reg.1.x reg.2.(Intercept) reg.2.x reg.2.z
#1: Ed 0.7280448 0.75977555 0.1132509 0.83322290 -0.484348116 0.7655563
#2: Frank 0.6100339 -0.07830664 0.2700846 0.04720686 0.004027939 0.7168521
#3: Tony 0.2710623 -0.78319379 0.9166601 -0.35836990 0.622822617 0.4161102

Create regression coefficient table

If you change your loop to be a counter instead of actual data frame objects, this becomes pretty straightforward:

tmp <- data.frame(matrix(nrow=length(dataframes), ncol=9))
names(tmp) <- c("site_name","int", "coflin", "cofsqd", "fstat", "ldf", "udf", "cod", "pval")
for(j in seq_along(dataframes)) {
i <- dataframes[[j]]

# rest of your code goes here

new.row <- c(names(dataframes)[[j]], int, coflin, ..., cod, pval)
tmp[j, ] <- new.row
}

Notice how you can't use i as the id as that is a data frame, but you can use the corresponding name. Also, we initialized the results data frame to be the correct size in number of rows.

One thing to watch out for is that modifying data frames is slow, so you typically don't want to do this in a loop unless your loop doesn't iterate that many times. If it does, one simple solution is use a matrix first, and convert it to data frame after you are done looping.



Related Topics



Leave a reply



Submit