How to Minimize Size of Object of Class "Lm" Without Compromising It Being Passed to Predict()

Is there a way to 'compress' an lm() object for later prediction?

You can use biglm to fit your models, a biglm model object is smaller than a lm model object. You can use predict.biglm create a function that you can pass the newdata design matrix to, which returns the predicted values.

Another option is to use saveRDS to save the files, which appear to be slightly smaller, as they have less overhead, being a single object, not like save which can save multiple objects.

 library(biglm)
m <- lm(log(Volume)~log(Girth)+log(Height), trees)
mm <- lm(log(Volume)~log(Girth)+log(Height), trees, model = FALSE, x =FALSE, y = FALSE)
bm <- biglm(log(Volume)~log(Girth)+log(Height), trees)
pred <- predict(bm, make.function = TRUE)
save(m, file = 'm.rdata')
save(mm, file = 'mm.rdata')
save(bm, file = 'bm.rdata')
save(pred, file = 'pred.rdata')
saveRDS(m, file = 'm.rds')
saveRDS(mm, file = 'mm.rds')
saveRDS(bm, file = 'bm.rds')
saveRDS(pred, file = 'pred.rds')

file.info(paste(rep(c('m','mm','bm','pred'),each=2) ,c('.rdata','.rds'),sep=''))
# size isdir mode mtime ctime atime exe
# m.rdata 2806 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:24:23 2013-03-07 11:29:30 no
# m.rds 2798 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:29:30 2013-03-07 11:29:30 no
# mm.rdata 2113 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:24:28 2013-03-07 11:29:30 no
# mm.rds 2102 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:29:30 2013-03-07 11:29:30 no
# bm.rdata 592 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:24:34 2013-03-07 11:29:30 no
# bm.rds 583 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:29:30 2013-03-07 11:29:30 no
# pred.rdata 1007 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:24:40 2013-03-07 11:29:30 no
# pred.rds 995 FALSE 666 2013-03-07 11:29:30 2013-03-07 11:27:30 2013-03-07 11:29:30 no

How to predict with with nlme::predict.lme without calling the whole 'object'

The lme objects, as with any class, are designed to contain everything they may need for any function that has been written to be called on it. If you want to just use the bare bones you will need to pull out only what you need and reassign the class so the correct S3 method is called. To see which components you need, you would have to look at the source nlme:::predict.lme. Here is an example with the Orthodont dataset.

library(nlme)
data(Orthodont)

# Just fit a model
fm1 <- lme(distance ~ age, data = Orthodont)

# pull out the minimal components needed for prediction
min_fm1 <- list(modelStruct = fm1$modelStruct,
dims = fm1$dims,
contrasts = fm1$contrasts,
coefficients = fm1$coefficients,
groups = fm1$groups,
call = fm1$call,
terms = fm1$terms)

# assign class otherwise the default predict method would be called
class(min_fm1) <- "lme"

# By dropping this like fm1$data you trim it down quite a bit
object.size(fm1)
63880 bytes
object.size(min_fm1)
22992 bytes

# make sure output identical
identical(predict(min_fm1, Orthodont, level = 0, na.action = na.omit),
predict(fm1, Orthodont, level = 0, na.action = na.omit))
[1] TRUE

Extract prediction function only from lm() call

First, we borrow a function from this other question that reduces the size of the lm object.

clean_model = function(cm) {
# just in case we forgot to set
# y=FALSE and model=FALSE
cm$y = c()
cm$model = c()

cm$residuals = c()
cm$fitted.values = c()
cm$effects = c()
cm$qr$qr = c()
cm$linear.predictors = c()
cm$weights = c()
cm$prior.weights = c()
cm$data = c()

# also try and avoid some large environments
attr(cm$terms,".Environment") = c()
attr(cm$formula,".Environment") = c()

cm
}

Then write a simple wrapper that reduces the model and returns the prediction function:

prediction_function <- function(model) {
stopifnot(inherits(model, 'lm'))
model <- clean_model(model)
function (...) predict(model, ...)
}

Example:

set.seed(1234)
df <- data.frame(x = 1:9, y = 2 * 1:9 + 3 + rnorm(9, sd = 0.5))
fit <- lm(y ~ x, df)
f <- prediction_function(fit)
f(data.frame(x = 5:6))
       1        2 
12.83658 14.83351

Check sizes:

object.size(fit)
# 16648 bytes

object.size(prediction_function)
# 8608 bytes

For this small example we save half the space.

Let's use some larger data:

data(diamonds, package = 'ggplot2')

fit2 <- lm(carat ~ price, diamonds)
predict(fit2, data.frame(price = 200))
f2 <- prediction_function(fit2)
f2(data.frame(price = 200))

print(object.size(fit2), units = 'Mb');
object.size(f2)

Now we go from 13 Mb to 5376 bytes.



Related Topics



Leave a reply



Submit