Using Predict with a List of Lm() Objects

using predict with a list of lm() objects

Here's my attempt:

predNaughty <- ddply(newData, "state", transform,
value=predict(modelList[[paste(piece$state[1])]], newdata=piece))
head(predNaughty)
# year state value
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229
predDiggsApproved <- ddply(newData, "state", function(x)
transform(x, value=predict(modelList[[paste(x$state[1])]], newdata=x)))
head(predDiggsApproved)
# year state value
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229

JD Long edit

I was inspired enough to work out an adply() option:

pred3 <- adply(newData, 1,  function(x)
predict(modelList[[paste(x$state)]], newdata=x))
head(pred3)
# year state 1
# 1 50 50 5176.326
# 2 51 50 5274.907
# 3 52 50 5373.487
# 4 53 50 5472.068
# 5 54 50 5570.649
# 6 55 50 5669.229

Using predict on lm list with confidence interval

If you are wanting to apply on each element of a list, lapply (list apply) is the way to go:

do.call(rbind, lapply(mod, function(x) predict(x, newdata, interval="confidence")))

fit lwr upr
1 1.951769 1.3772699 2.526268
1 1.851953 1.4852869 2.218618
1 1.339843 0.1453728 2.534312
1 1.987887 1.4446006 2.531174

so using lapply, we are running our anonymous function predict(x, newdata, interval="confidence")) with x being each element of mod. The do.call turns the list output into a nicer matrix.

prediction using linear model and the importance of data.frame

When you call predict on a lm object, the function called is predict.lm. When you run it like:

predict(model_1, Sepal.Width=c(1,3,4,5))

What you are doing is providing c(1,3,4,5) an argument or parameter to Sepal.Width, which predict.lm ignores since this argument does not exist for this function.

When there is no new input data, you are running predict.lm(model_1), and getting back the fitted values:

table(predict(model_1) == predict(model_1, Sepal.Width=c(1,3,4,5)))

TRUE
150

In this case, you fitted the model with a formula, the predict.lm function needs your data frame to reconstruct the independent or exogenous matrix, matrix multiply with the coefficients and return you the predicted values.

This is briefly what predict.lm is doing:

newdata = data.frame(Sepal.Width=c(1,3,4,5))
Terms = delete.response(terms(model_1))
X = model.matrix(Terms,newdata)

X
(Intercept) Sepal.Width
1 1 1
2 1 3
3 1 4
4 1 5

X %*% coefficients(model_1)
[,1]
1 6.302861
2 5.856139
3 5.632778
4 5.409417

predict(model_1,newdata)

1 2 3 4
6.302861 5.856139 5.632778 5.409417

Using predict() with a vector of specific values in lapply for a list of data.frames

The names of the individual data.frames in "list1" are the column names, not the overall name of that list item. To see this, run names(list1[[1]]).

names(list1[[1]])
"name" "A" "B"

If you want to loop through both the list and the list names simultaneously then purrr::imap() is useful.

The anonymous function will need two arguments, which I call x and y, to refer to the list and the list names, respectively.

library(purrr)
imap(list1, function(x, y) predict(lm(A~B, data=x), new.values[new.values$name == y,],
interval="predict"))
$d1
fit lwr upr
1 1.571429 -2.48742 5.630277

$d2
fit lwr upr
2 2.214286 -1.74179 6.170362

$d3
fit lwr upr
3 2.857143 -1.589103 7.303388

If your prediction values are also stored in a list, purrr::map2() would be useful for looping through two lists simultaneously.

To show this I'll split the "new.values" object into a list. I can then loop through the two lists (of equal length) via map2(). I use the formula notation here, where .x refers to the first list and .y to the second instead of writing an anonymous function.

new.val.list = split(new.values, new.values$name)
map2(list1, new.val.list, ~predict(lm(A~B, data=.x), .y,
interval="predict"))
$d1
fit lwr upr
1 1.571429 -2.48742 5.630277

$d2
fit lwr upr
2 2.214286 -1.74179 6.170362

$d3
fit lwr upr
3 2.857143 -1.589103 7.303388

How to add a prediction column with linear models stored in data.table or tibble?

With tidyverse, we use map2 to loop through the 'model', corresponding 'x' values, pass the new data in predict as a data.frame or tibble

library(tidyverse)
model_dt %>%
mutate(pred_y = map2_dbl(model, x, ~ predict.lm(.x, tibble(x = .y))))
# A tibble: 2 x 4
# id model x pred_y
# <dbl> <list> <dbl> <dbl>
#1 1 <lm> 3 1.6
#2 2 <lm> 3 2.60

Or with the data.table (object) with Map

model_dt[,  pred_y := unlist(Map(function(mod, y) 
predict.lm(mod, data.frame(x = y)), model, x)), id][]
# id model x pred_y
#1: 1 <lm> 3 1.6
#2: 2 <lm> 3 2.6

Extract prediction function only from lm() call

First, we borrow a function from this other question that reduces the size of the lm object.

clean_model = function(cm) {
# just in case we forgot to set
# y=FALSE and model=FALSE
cm$y = c()
cm$model = c()

cm$residuals = c()
cm$fitted.values = c()
cm$effects = c()
cm$qr$qr = c()
cm$linear.predictors = c()
cm$weights = c()
cm$prior.weights = c()
cm$data = c()

# also try and avoid some large environments
attr(cm$terms,".Environment") = c()
attr(cm$formula,".Environment") = c()

cm
}

Then write a simple wrapper that reduces the model and returns the prediction function:

prediction_function <- function(model) {
stopifnot(inherits(model, 'lm'))
model <- clean_model(model)
function (...) predict(model, ...)
}

Example:

set.seed(1234)
df <- data.frame(x = 1:9, y = 2 * 1:9 + 3 + rnorm(9, sd = 0.5))
fit <- lm(y ~ x, df)
f <- prediction_function(fit)
f(data.frame(x = 5:6))
       1        2 
12.83658 14.83351

Check sizes:

object.size(fit)
# 16648 bytes

object.size(prediction_function)
# 8608 bytes

For this small example we save half the space.

Let's use some larger data:

data(diamonds, package = 'ggplot2')

fit2 <- lm(carat ~ price, diamonds)
predict(fit2, data.frame(price = 200))
f2 <- prediction_function(fit2)
f2(data.frame(price = 200))

print(object.size(fit2), units = 'Mb');
object.size(f2)

Now we go from 13 Mb to 5376 bytes.

How to index predict plm object in R

Your code is not unambiguous, thus check for names which gives a boolean inside the brackets.

yy[names(yy) %in% "ARIZONA"]
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.42640094 -0.36662046 -0.27070381 -0.18091251 -0.14102111 -0.18021858
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.14774000 -0.08398230 0.01383581 0.09852240 0.12731152 0.17116278
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# 0.14950942 0.19194103 0.28735344 0.34586645 0.41209687


Related Topics



Leave a reply



Submit