Predict() - Maybe I'M Not Understanding It

Predict() - Maybe I'm not understanding it

First, you want to use

model <- lm(Total ~ Coupon, data=df)

not model <-lm(df$Total ~ df$Coupon, data=df).

Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.

Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.

Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:

model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)

Predict() - On what criteria do I choose my new values?

predict works when you have some new data without response and you want to get the result from our model. Sometimes you may put the origin data into the predict function because you want to get something like confidence interval or prediction interval. These 3 values: "79037022, 83100656, 104299800" appears just because you are interested in the response when the input is these three values. You can use other values of course and R will give you the result. But remember, the model usually works only when the new data is not far from the original data.

Error using Predict with Polynomial regressions in R

The first warning will go away you need to convert the validation data to the same format as the training data before you run predict, to ensure that both the training / validation data have exactly the same set of regressors / predictor variables.

The 2nd warning will still be there, since you are fitting a very high degree polynomial, it's a rank-deficient fit (also it is highly likely to overfit your training data, so the model may not be generalizable / useful).

What you can do instead to reduce the overfitting / eliminate rank-deficiency is to fit a lower degree polynomial, in which case both the warnings will go away.

Try this to get rid of both the warnings:

my_new_df<-data.frame(var_1,var_2)
names(my_new_df)<-c("Time_values","Count")

n <- 10 # lower degree polynomial
# first generate all the polynomial regressors on the entire data
my_new_df <- cbind.data.frame(my_new_df[-1], poly(my_new_df$Time_values, degree=n, raw=TRUE))
names(my_new_df)[-1] <- paste0('X', names(my_new_df)[-1])

train_poly_data<-my_new_df[1:150,] # training data set
valid_poly_data<-my_new_df[151:200,] # validation data set

test_poly_data<-my_new_df[201:252,] # test data set

#obtain a polymomial regression model with n Degrees
poly_tr<-lm(Count ~ ., train_poly_data)
summary(poly_tr)
pred <- predict(poly_tr, newdata=valid_poly_data)
pred


# 151 152 153 154 155 156
# 796.5672 982.6862 1219.7434 1517.9844 1889.2235 2347.0258

Predict for wider range

This is something that achieves what you want. The cause for your original problem is that in your regression, the predictor's name is Sepal.width not x, and your prediction doesn't use your new.range at all, so you have to do something like new.range<- data.frame(Sepal.Width=seq(2,10,length.out=50)) to make predictions on your new.range.

Another problem is that you have to make the new.range's length to be 50, so that the pred and new.range fit in the original data.frame.

And then you can draw the plot you want, note that the new.range becomes Sepal.Width.1.

library(dplyr)
cc <- iris %>%
group_by(Species) %>%
do({
mod <- nlsLM(Sepal.Length ~ k*Sepal.Width/2+U, start=c(k=10,U=5), data = ., trace=F, control = nls.lm.control(maxiter=100))
new.range<- data.frame(Sepal.Width=seq(2,10,length.out=50))
pred <- predict(mod, newdata =new.range)
# pred <- predict(mod, newdata =.["Sepal.Width"])
data.frame(., new.range, pred)

})

library(ggplot2)

ggplot(cc,aes(y=Sepal.Length,x=Sepal.Width ,col=factor(Species)))+
geom_point()+
facet_wrap(~Species)+
geom_line(aes(x=Sepal.Width.1,y=pred),size=1)

lm predict won't predict

Instead of,

lm1 <- lm(pubs1$actual ~ pubs1$pred37 + pubs1$pred1 + pubs1$pred2 
pubs1$pred3 + pubs1$pred4)

try,

lm1 <- lm(actual ~ pred37 + pred1 + pred2 
pred3 + pred4, data = pubs1)

Otherwise predict.lm will be looking for variables called pubs1$pred37 in your new data frame.

How do I produce a set of predictions based on a new set of data using predict in R?

It's never a good idea to use the $ symbol when using the formula syntax (and most of the times it's completely unnecessary. This is especially true when you are trying to make predictions because the predict() function works hard to exactly match up column names and data.types. So rather than

fit <- lm(my$y ~ my$x)

use

fit <- lm(y ~ x, my)

So a complete example would be

set.seed(15) # for reproducibility
my <- data.frame(x=rnorm(1000))
my$y <- 0.5*my$x+0.5*rnorm(1000)
fit <- lm(y ~ x, my)
mySample <- my[sample(1:nrow(my), 100),]
head(predict(fit, mySample))
# 694 278 298 825 366 980
# 0.43593108 -0.67936324 -0.42168723 -0.04982095 -0.72499087 0.09627245

Predicting data from gamlss model in handler function using tryCatch in R

So after a bit of trial and error I managed to make it work. I believe the problem lies in the mod_sim object that is not saved to the global environment. predict (or predict.gamlss here) is probably not looking in the function environment for the mod_sim object although I don't understand why it wouldn't. Anyway using <<- (i.e. assigning the object in the global environment from the function) for every object created in the function seemed to do the trick. If anyone has an explanation on why this happens though I'd be glad to understand what I'm doing wrong!



Related Topics



Leave a reply



Submit