Predict() - Maybe I'm not understanding it
First, you want to use
model <- lm(Total ~ Coupon, data=df)
not model <-lm(df$Total ~ df$Coupon, data=df)
.
Second, by saying lm(Total ~ Coupon)
, you are fitting a model that uses Total
as the response variable, with Coupon
as the predictor. That is, your model is of the form Total = a + b*Coupon
, with a
and b
the coefficients to be estimated. Note that the response goes on the left side of the ~
, and the predictor(s) on the right.
Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon
, not Total
.
Third, judging by your specification of newdata
, it looks like you're actually after a model to fit Coupon
as a function of Total
, not the other way around. To do this:
model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
Predict() - On what criteria do I choose my new values?
predict
works when you have some new data without response and you want to get the result from our model. Sometimes you may put the origin data into the predict function because you want to get something like confidence interval or prediction interval. These 3 values: "79037022, 83100656, 104299800" appears just because you are interested in the response when the input is these three values. You can use other values of course and R will give you the result. But remember, the model usually works only when the new data is not far from the original data.
Error using Predict with Polynomial regressions in R
The first warning will go away you need to convert the validation data to the same format as the training data before you run predict, to ensure that both the training / validation data have exactly the same set of regressors / predictor variables.
The 2nd warning will still be there, since you are fitting a very high degree polynomial, it's a rank-deficient fit (also it is highly likely to overfit your training data, so the model may not be generalizable / useful).
What you can do instead to reduce the overfitting / eliminate rank-deficiency is to fit a lower degree polynomial, in which case both the warnings will go away.
Try this to get rid of both the warnings:
my_new_df<-data.frame(var_1,var_2)
names(my_new_df)<-c("Time_values","Count")
n <- 10 # lower degree polynomial
# first generate all the polynomial regressors on the entire data
my_new_df <- cbind.data.frame(my_new_df[-1], poly(my_new_df$Time_values, degree=n, raw=TRUE))
names(my_new_df)[-1] <- paste0('X', names(my_new_df)[-1])
train_poly_data<-my_new_df[1:150,] # training data set
valid_poly_data<-my_new_df[151:200,] # validation data set
test_poly_data<-my_new_df[201:252,] # test data set
#obtain a polymomial regression model with n Degrees
poly_tr<-lm(Count ~ ., train_poly_data)
summary(poly_tr)
pred <- predict(poly_tr, newdata=valid_poly_data)
pred
# 151 152 153 154 155 156
# 796.5672 982.6862 1219.7434 1517.9844 1889.2235 2347.0258
Predict for wider range
This is something that achieves what you want. The cause for your original problem is that in your regression, the predictor's name is Sepal.width
not x
, and your prediction doesn't use your new.range
at all, so you have to do something like new.range<- data.frame(Sepal.Width=seq(2,10,length.out=50))
to make predictions on your new.range
.
Another problem is that you have to make the new.range
's length to be 50, so that the pred
and new.range
fit in the original data.frame.
And then you can draw the plot you want, note that the new.range
becomes Sepal.Width.1
.
library(dplyr)
cc <- iris %>%
group_by(Species) %>%
do({
mod <- nlsLM(Sepal.Length ~ k*Sepal.Width/2+U, start=c(k=10,U=5), data = ., trace=F, control = nls.lm.control(maxiter=100))
new.range<- data.frame(Sepal.Width=seq(2,10,length.out=50))
pred <- predict(mod, newdata =new.range)
# pred <- predict(mod, newdata =.["Sepal.Width"])
data.frame(., new.range, pred)
})
library(ggplot2)
ggplot(cc,aes(y=Sepal.Length,x=Sepal.Width ,col=factor(Species)))+
geom_point()+
facet_wrap(~Species)+
geom_line(aes(x=Sepal.Width.1,y=pred),size=1)
lm predict won't predict
Instead of,
lm1 <- lm(pubs1$actual ~ pubs1$pred37 + pubs1$pred1 + pubs1$pred2
pubs1$pred3 + pubs1$pred4)
try,
lm1 <- lm(actual ~ pred37 + pred1 + pred2
pred3 + pred4, data = pubs1)
Otherwise predict.lm
will be looking for variables called pubs1$pred37
in your new data frame.
How do I produce a set of predictions based on a new set of data using predict in R?
It's never a good idea to use the $
symbol when using the formula syntax (and most of the times it's completely unnecessary. This is especially true when you are trying to make predictions because the predict()
function works hard to exactly match up column names and data.types. So rather than
fit <- lm(my$y ~ my$x)
use
fit <- lm(y ~ x, my)
So a complete example would be
set.seed(15) # for reproducibility
my <- data.frame(x=rnorm(1000))
my$y <- 0.5*my$x+0.5*rnorm(1000)
fit <- lm(y ~ x, my)
mySample <- my[sample(1:nrow(my), 100),]
head(predict(fit, mySample))
# 694 278 298 825 366 980
# 0.43593108 -0.67936324 -0.42168723 -0.04982095 -0.72499087 0.09627245
Predicting data from gamlss model in handler function using tryCatch in R
So after a bit of trial and error I managed to make it work. I believe the problem lies in the mod_sim
object that is not saved to the global environment. predict
(or predict.gamlss
here) is probably not looking in the function environment for the mod_sim
object although I don't understand why it wouldn't. Anyway using <<-
(i.e. assigning the object in the global environment from the function) for every object created in the function seemed to do the trick. If anyone has an explanation on why this happens though I'd be glad to understand what I'm doing wrong!
Related Topics
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Remove Ids With Fewer Than 9 Unique Observations
Remove Space Between Plotted Data and the Axes
Multi-Row X-Axis Labels in Ggplot Line Chart
How to Force a Line Break in Rmarkdown'S Title
Remove Last N Rows in Data Frame With the Arbitrary Number of Rows
R: How to Get the Percentage Change from Two Different Columns
Filter a Data Frame According to Minimum and Maximum Values
Selecting Only Duplicates Based on Multiple Columns in R
Convert Multiple Columns of Numeric Data to Dates in R
How to Append a Sequential Number for Every Element in a Data Frame
Create and Assign Multiple New Dataframe Columns in Ifelse Statement
How to Add a Suffix (Or Prefix) Elements of an Existing List
Converting Data Frame into a List of Lists in R
How to Remove Rows With Any Zero Value