using predict with a list of lm() objects
Here's my attempt:
predNaughty <- ddply(newData, "state", transform,
value=predict(modelList[[paste(piece$state[1])]], newdata=piece))
head(predNaughty)
# year state value
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229
predDiggsApproved <- ddply(newData, "state", function(x)
transform(x, value=predict(modelList[[paste(x$state[1])]], newdata=x)))
head(predDiggsApproved)
# year state value
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229
JD Long edit
I was inspired enough to work out an adply()
option:
pred3 <- adply(newData, 1, function(x)
predict(modelList[[paste(x$state)]], newdata=x))
head(pred3)
# year state 1
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229
Using predict on lm list with confidence interval
If you are wanting to apply
on each element of a list, lapply
(list apply) is the way to go:
do.call(rbind, lapply(mod, function(x) predict(x, newdata, interval="confidence")))
fit lwr upr
1 1.951769 1.3772699 2.526268
1 1.851953 1.4852869 2.218618
1 1.339843 0.1453728 2.534312
1 1.987887 1.4446006 2.531174
so using lapply
, we are running our anonymous function predict(x, newdata, interval="confidence"))
with x being each element of mod. The do.call turns the list output into a nicer matrix.
prediction using linear model and the importance of data.frame
When you call predict
on a lm
object, the function called is predict.lm. When you run it like:
predict(model_1, Sepal.Width=c(1,3,4,5))
What you are doing is providing c(1,3,4,5)
an argument or parameter to Sepal.Width
, which predict.lm
ignores since this argument does not exist for this function.
When there is no new input data, you are running predict.lm(model_1)
, and getting back the fitted values:
table(predict(model_1) == predict(model_1, Sepal.Width=c(1,3,4,5)))
TRUE
150
In this case, you fitted the model with a formula, the predict.lm
function needs your data frame to reconstruct the independent or exogenous matrix, matrix multiply with the coefficients and return you the predicted values.
This is briefly what predict.lm
is doing:
newdata = data.frame(Sepal.Width=c(1,3,4,5))
Terms = delete.response(terms(model_1))
X = model.matrix(Terms,newdata)
X
(Intercept) Sepal.Width
1 1 1
2 1 3
3 1 4
4 1 5
X %*% coefficients(model_1)
[,1]
1 6.302861
2 5.856139
3 5.632778
4 5.409417
predict(model_1,newdata)
1 2 3 4
6.302861 5.856139 5.632778 5.409417
Using predict() with a vector of specific values in lapply for a list of data.frames
The names of the individual data.frames in "list1" are the column names, not the overall name of that list item. To see this, run names(list1[[1]])
.
names(list1[[1]])
"name" "A" "B"
If you want to loop through both the list and the list names simultaneously then purrr::imap()
is useful.
The anonymous function will need two arguments, which I call x
and y
, to refer to the list and the list names, respectively.
library(purrr)
imap(list1, function(x, y) predict(lm(A~B, data=x), new.values[new.values$name == y,],
interval="predict"))
$d1
fit lwr upr
1 1.571429 -2.48742 5.630277
$d2
fit lwr upr
2 2.214286 -1.74179 6.170362
$d3
fit lwr upr
3 2.857143 -1.589103 7.303388
If your prediction values are also stored in a list, purrr::map2()
would be useful for looping through two lists simultaneously.
To show this I'll split
the "new.values" object into a list. I can then loop through the two lists (of equal length) via map2()
. I use the formula notation here, where .x
refers to the first list and .y
to the second instead of writing an anonymous function.
new.val.list = split(new.values, new.values$name)
map2(list1, new.val.list, ~predict(lm(A~B, data=.x), .y,
interval="predict"))
$d1
fit lwr upr
1 1.571429 -2.48742 5.630277
$d2
fit lwr upr
2 2.214286 -1.74179 6.170362
$d3
fit lwr upr
3 2.857143 -1.589103 7.303388
How to add a prediction column with linear models stored in data.table or tibble?
With tidyverse
, we use map2
to loop through the 'model', corresponding 'x' values, pass the new data in predict
as a data.frame
or tibble
library(tidyverse)
model_dt %>%
mutate(pred_y = map2_dbl(model, x, ~ predict.lm(.x, tibble(x = .y))))
# A tibble: 2 x 4
# id model x pred_y
# <dbl> <list> <dbl> <dbl>
#1 1 <lm> 3 1.6
#2 2 <lm> 3 2.60
Or with the data.table
(object) with Map
model_dt[, pred_y := unlist(Map(function(mod, y)
predict.lm(mod, data.frame(x = y)), model, x)), id][]
# id model x pred_y
#1: 1 <lm> 3 1.6
#2: 2 <lm> 3 2.6
Extract prediction function only from lm() call
First, we borrow a function from this other question that reduces the size of the lm
object.
clean_model = function(cm) {
# just in case we forgot to set
# y=FALSE and model=FALSE
cm$y = c()
cm$model = c()
cm$residuals = c()
cm$fitted.values = c()
cm$effects = c()
cm$qr$qr = c()
cm$linear.predictors = c()
cm$weights = c()
cm$prior.weights = c()
cm$data = c()
# also try and avoid some large environments
attr(cm$terms,".Environment") = c()
attr(cm$formula,".Environment") = c()
cm
}
Then write a simple wrapper that reduces the model and returns the prediction function:
prediction_function <- function(model) {
stopifnot(inherits(model, 'lm'))
model <- clean_model(model)
function (...) predict(model, ...)
}
Example:
set.seed(1234)
df <- data.frame(x = 1:9, y = 2 * 1:9 + 3 + rnorm(9, sd = 0.5))
fit <- lm(y ~ x, df)
f <- prediction_function(fit)
f(data.frame(x = 5:6))
1 2
12.83658 14.83351
Check sizes:
object.size(fit)
# 16648 bytes
object.size(prediction_function)
# 8608 bytes
For this small example we save half the space.
Let's use some larger data:
data(diamonds, package = 'ggplot2')
fit2 <- lm(carat ~ price, diamonds)
predict(fit2, data.frame(price = 200))
f2 <- prediction_function(fit2)
f2(data.frame(price = 200))
print(object.size(fit2), units = 'Mb');
object.size(f2)
Now we go from 13 Mb to 5376 bytes.
How to index predict plm object in R
Your code is not unambiguous, thus check for names
which gives a boolean inside the brackets.
yy[names(yy) %in% "ARIZONA"]
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.42640094 -0.36662046 -0.27070381 -0.18091251 -0.14102111 -0.18021858
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.14774000 -0.08398230 0.01383581 0.09852240 0.12731152 0.17116278
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# 0.14950942 0.19194103 0.28735344 0.34586645 0.41209687
Related Topics
R: How to Draw a Line with Multiple Arrows in It
Plot Random Effects from Lmer (Lme4 Package) Using Qqmath or Dotplot: How to Make It Look Fancy
Formatting Mouse Over Labels in Plotly When Using Ggplotly
Using Prophet Package to Predict by Group in Dataframe in R
Asterisk (*) VS. Colon (:) in R Formulas
Unnest a List Column Directly into Several Columns
How to Write an R Function That Evaluates an Expression Within a Data-Frame
Linear Model and Dplyr - a Better Solution
Sub-Assign by Reference on Vector in R
How to Remove Na from Facet_Wrap in Ggplot2
Do I Need to Normalize (Or Scale) Data for Randomforest (R Package)
Replacing All Missing Values in R Data.Table with a Value
How to Skip an Error in a Loop
Convert Matrix to Three Column Data.Frame
How to Display Verbatim Inline R Code with Backticks Using Rmarkdown