Using data.table to create a column of regression coefficients
I think this is what you want:
new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]
lm
returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.
Creating Data Table of Regression Coefficients
Use names to access the names of the coefficients.
var_names=names(coef(model_B))
coef_vals=coef(model_B)
data.table(Variables=var_names, RegressionCoefficients=coef_vals)
Variables RegressionCoefficients
1: (Intercept) 2.984208e-16
2: radius 1.000000e+00
3: perimeter 1.000000e+00
Using data.table to create a column of regression coefficients
I think this is what you want:
new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]
lm
returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.
Regression models as column in data table, R
If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects
MyVarb <- data.table(Y=rnorm(100),
V1=rnorm(100),
V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
reg<-lm(mod, data=MyVarb)
dt<-data.table(summary(reg)$coefficients)
dt[,coef:=row.names(summary(reg)$coefficients)]
dt[,aic:=AIC(reg)]
dt[,model:=mod]
}))
How to run a (linear) regression by group and add coefficients to the original data.table?
Edit
I didn't realize that you want to add to the original table. In this case, use :=
in j
as follows (the original answer, which returns only the summary, is kept at the end).
dt[,
c('slope', 'intercept') := {
fit <- lm(y ~ x, data = .SD)
list(fit$coefficients[1], fit$coefficients[2])
},
by = country]
dt[]
## x y country slope intercept
## 1: 0.26550866 0.82094629 1 0.6275887 -0.2019328
## 2: 0.37212390 0.64706019 2 0.5042928 0.1252771
## 3: 0.57285336 0.78293276 1 0.6275887 -0.2019328
## 4: 0.90820779 0.55303631 2 0.5042928 0.1252771
## 5: 0.20168193 0.52971958 1 0.6275887 -0.2019328
## 6: 0.89838968 0.78935623 2 0.5042928 0.1252771
## 7: 0.94467527 0.02333120 1 0.6275887 -0.2019328
## 8: 0.66079779 0.47723007 2 0.5042928 0.1252771
## 9: 0.62911404 0.73231374 1 0.6275887 -0.2019328
## 10: 0.06178627 0.69273156 2 0.5042928 0.1252771
## 11: 0.20597457 0.47761962 1 0.6275887 -0.2019328
## 12: 0.17655675 0.86120948 2 0.5042928 0.1252771
## 13: 0.68702285 0.43809711 1 0.6275887 -0.2019328
## 14: 0.38410372 0.24479728 2 0.5042928 0.1252771
## 15: 0.76984142 0.07067905 1 0.6275887 -0.2019328
## 16: 0.49769924 0.09946616 2 0.5042928 0.1252771
## 17: 0.71761851 0.31627171 1 0.6275887 -0.2019328
## 18: 0.99190609 0.51863426 2 0.5042928 0.1252771
## 19: 0.38003518 0.66200508 1 0.6275887 -0.2019328
## 20: 0.77744522 0.40683019 2 0.5042928 0.1252771
## 21: 0.93470523 0.91287592 1 0.6275887 -0.2019328
## 22: 0.21214252 0.29360337 2 0.5042928 0.1252771
## 23: 0.65167377 0.45906573 1 0.6275887 -0.2019328
## 24: 0.12555510 0.33239467 2 0.5042928 0.1252771
## 25: 0.26722067 0.65087047 1 0.6275887 -0.2019328
## 26: 0.38611409 0.25801678 2 0.5042928 0.1252771
## 27: 0.01339033 0.47854525 1 0.6275887 -0.2019328
## 28: 0.38238796 0.76631067 2 0.5042928 0.1252771
## 29: 0.86969085 0.08424691 1 0.6275887 -0.2019328
## 30: 0.34034900 0.87532133 2 0.5042928 0.1252771
## 31: 0.48208012 0.33907294 1 0.6275887 -0.2019328
## 32: 0.59956583 0.83944035 2 0.5042928 0.1252771
## 33: 0.49354131 0.34668349 1 0.6275887 -0.2019328
## 34: 0.18621760 0.33377493 2 0.5042928 0.1252771
## 35: 0.82737332 0.47635125 1 0.6275887 -0.2019328
## 36: 0.66846674 0.89219834 2 0.5042928 0.1252771
## 37: 0.79423986 0.86433947 1 0.6275887 -0.2019328
## 38: 0.10794363 0.38998954 2 0.5042928 0.1252771
## 39: 0.72371095 0.77732070 1 0.6275887 -0.2019328
## 40: 0.41127443 0.96061800 2 0.5042928 0.1252771
## x y country slope intercept
Original answer
This is a perfect avenue for making use of the flexible j
expression in data.table
. You can put anything in j
as long as it returns a list.
dt[,
{
fit <- lm(y ~ x, data = .SD)
list(intercept = fit$coefficients[1], slope = fit$coefficients[2])
},
by = country]
# country intercept slope
#1: 1 0.6276 -0.2019
#2: 2 0.5043 0.1253
Loop over data table columns and apply glm
You can apply the model and extract the coefficients in the same loop. Using lapply
:
output <- do.call(rbind, lapply(names(dt), function(x) {
model <- glm(reformulate(x, 'y'), dt, family=binomial(link="logit"))
summary(model)$coefficients[2]
}))
Regression and summary statistics by group within a data.table
dt[,c(y.med = median(y),
reg.1 = as.list(coef(lm(y ~ x))),
reg.2 = as.list(coef(lm(y ~ x + z)))), by=ID]
# ID y.med reg.1.(Intercept) reg.1.x reg.2.(Intercept) reg.2.x reg.2.z
#1: Ed 0.7280448 0.75977555 0.1132509 0.83322290 -0.484348116 0.7655563
#2: Frank 0.6100339 -0.07830664 0.2700846 0.04720686 0.004027939 0.7168521
#3: Tony 0.2710623 -0.78319379 0.9166601 -0.35836990 0.622822617 0.4161102
Create regression coefficient table
If you change your loop to be a counter instead of actual data frame objects, this becomes pretty straightforward:
tmp <- data.frame(matrix(nrow=length(dataframes), ncol=9))
names(tmp) <- c("site_name","int", "coflin", "cofsqd", "fstat", "ldf", "udf", "cod", "pval")
for(j in seq_along(dataframes)) {
i <- dataframes[[j]]
# rest of your code goes here
new.row <- c(names(dataframes)[[j]], int, coflin, ..., cod, pval)
tmp[j, ] <- new.row
}
Notice how you can't use i
as the id as that is a data frame, but you can use the corresponding name. Also, we initialized the results data frame to be the correct size in number of rows.
One thing to watch out for is that modifying data frames is slow, so you typically don't want to do this in a loop unless your loop doesn't iterate that many times. If it does, one simple solution is use a matrix first, and convert it to data frame after you are done looping.
Related Topics
R Calculate the Average of One Column Corresponding to Each Bin of Another Column
How to Install the Odbc Driver for Snowflake Successfully on an M1 Apple Silicon MAC
Prevent Automatic Conversion of Single Column to Vector
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Legend Venn Diagram in Venneuler
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
Use of .By and .Eachi in the Data.Table Package
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
R: How to Aggregate Some Columns While Keeping Other Columns
How to Download a Large Binary File with Rcurl *After* Server Authentication
Calculate Elapsed Time Since Last Event
Dplyr 0.7 Equivalent for Deprecated Mutate_
Mgcv Gam() Error: Model Has More Coefficients Than Data
Calculate the Derivative of a Data-Function in R
How to Install R Packages via Proxy [User + Password]
How to Reference Column Names That Start with a Number, in Data.Table
Apply a Function to All Variables Starting with Specific Pattern in R