data.table vs plyr regression output
Try this:
> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
ID (Intercept) x z
[1,] Frank -0.2928611 0.07215896 1.835106
[2,] Tony 0.9120795 -1.11153056 2.041260
[3,] Ed 1.0498359 5.77131778 -1.253741
I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.
Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...))
did not succeed in the manner we hoped.
Can we pass objects types to plyr or data.table functions?
I'd pull the Variable
(and maybe Trial
s) out into a data.frame
and use aggregate
from there:
df <- data.frame(Variable=unlist(a[,1]), Trial=unlist(a[,2]))
df$Edges <- laply(a[,4],ecount)
aggregate(Edges ~ Variable, data=df, mean)
should do what you want—assuming I understand what you want!
(I think you'll need unlist
because you've got your matrix
of list
s)
Use Predict on data.table with Linear Regression
You are predicting onto the entire new
data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.
This is an instance where .BY
will be useful. Here are two possibilities
a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]
b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]
both of which give identical results
identical(a,b)
# [1] TRUE
a
# group V1
#1: a -2.525502
#2: a 3.319445
#3: a 4.340253
#4: b -14.588933
#5: b 11.280766
#6: b -1.132324
Regression and summary statistics by group within a data.table
dt[,c(y.med = median(y),
reg.1 = as.list(coef(lm(y ~ x))),
reg.2 = as.list(coef(lm(y ~ x + z)))), by=ID]
# ID y.med reg.1.(Intercept) reg.1.x reg.2.(Intercept) reg.2.x reg.2.z
#1: Ed 0.7280448 0.75977555 0.1132509 0.83322290 -0.484348116 0.7655563
#2: Frank 0.6100339 -0.07830664 0.2700846 0.04720686 0.004027939 0.7168521
#3: Tony 0.2710623 -0.78319379 0.9166601 -0.35836990 0.622822617 0.4161102
Using data.table to create a column of regression coefficients
I think this is what you want:
new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]
lm
returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.
Regression models as column in data table, R
If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects
MyVarb <- data.table(Y=rnorm(100),
V1=rnorm(100),
V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
reg<-lm(mod, data=MyVarb)
dt<-data.table(summary(reg)$coefficients)
dt[,coef:=row.names(summary(reg)$coefficients)]
dt[,aic:=AIC(reg)]
dt[,model:=mod]
}))
Linear Regression and storing results in data frame
Here's a vote for the plyr
package and ddply()
.
plyrFunc <- function(x){
mod <- lm(b~c, data = x)
return(summary(mod)$coefficients[2,3])
}
tStats <- ddply(dF, .(a), plyrFunc)
tStats
a V1
1 a 1.6124515
2 b -0.1369306
3 c 0.6852483
R: Are there any known issues when plyr/dplyr/data.table and plm packages used together
It seems like in your data (maybe due to the merging process) you have individuals which have the same value in the time index more than once (or more than one NA).
You could either look at your data or try table(index(your_pdataframe), useNA = "ifany")
to find out which.
Related Topics
Subset Observations That Differ by at Least 30 Minutes Time
Xpath to Extract Text After Br Tags in R
Chain Arithmetic Operators in Dplyr with %>% Pipe
Add Rows to Grouped Data with Dplyr
Multiple Condition If-Else Using Dplyr, Custom Function, or Purrr
Mapping Specific States and Provinces in R
R Return the Index of the Minimum Column for Each Row
How to Skip Error Checking at Rmarkdown Compiling
Sorting of Categorical Variables in Ggplot
Using Get Inside Lapply, Inside a Function
Ggplot2:How to Reduce the Width and the Space Between Bars with Geom_Bar
Multiple Colors in a Facet Strip Background
How to Use Superscript with Ggplot2
Passing a 'Data.Table' to C++ Functions Using 'Rcpp' And/Or 'Rcpparmadillo'
How to Pass the "..." Parameters in the Parent Function to Its Two Children Functions in R
R - Count Shiny Download Button Clicks
Subsetting a Data.Table by Range Making Use of Binary Search