Calculating predictions for multiple linear regression
Instead of redefining the factors, just use the factor level in quotation marks in predict
.
predict(m2, list(age=40, sex="male", bmi=30, children=2, smoker="yes",
region="northwest"), int="p", level=0.98)
# fit lwr upr
# 1 -1.978994 -9.368242 5.410254
Data
dat <- structure(list(charges = c(1.37095844714667, -0.564698171396089,
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484,
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), age = c(20L, 58L, 44L, 53L, 22L, 51L, 20L, 75L, 59L, 41L),
sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female",
"male"), class = "factor"), bmi = c(25.3024309248682, 24.6058854935878,
25.7881406228236, 25.6707038267505, 24.0508191903124, 25.036135738485,
27.115755613237, 25.1674409043556, 24.1201634714689, 25.9469131749433
), children = c(4L, 1L, 5L, 1L, 1L, 4L, 0L, 0L, 3L, 4L),
smoker = c("no", "yes", "yes", "no", "no", "yes", "yes",
"yes", "yes", "no"), region = structure(c(1L, 2L, 2L, 3L,
1L, 2L, 3L, 3L, 3L, 2L), .Label = c("northeast", "northwest",
"southeast"), class = "factor")), row.names = c(NA, -10L), class = "data.frame")
R: multiple linear regression model and prediction model
I am putting everything from the comments into this answer.
1) You can use predict
rather than predict.lm
as predict
will know your input is of class lm
and do the right thing automatically.
2 The newdataset
should be a data.frame
with the same variables as your original predictors - in this case alt
and sdist
.
3) If you are bringing in you data using read.table
by default it will create a data.frame
. This assumes that the new data has columns named alt
and sdist
Then you can do:
NewDataSet<-read.table(whatever)
NewPredictions<- predict(model1, newdata=NewDatSet)
4) After you have done this if you want to check the predictions - you can do the following
summary(model1)
This will give you the intercept and the coefficients for alt
and sdist
NewDataSet[1,]
This should give you the alt
and sdist
values for the first row, you can change the 1 in the bracket to be any row you want. Then use the information from summary(model1)
to calculate what the predicted value should be using any method that you trust.
Finally use
NewPredictions[1]
to get what predict()
gave you for the first row (or change the 1 to any other row)
Hopefully that should all work out.
r function: multiple linear regression prediction estimate and interval (user-defined function)
Update:
I found the problem, it was at t value t.alpha.demi<- qt(0.975, df=n-2)
which explains why it didn't have the difference with single linear regression but did with multiple.
I changed it to t.alpha.demi<- qt(0.975, df=n-length(beta))
It was a mistake on my end. Regards,
Cyril S
Multiple linear model prediction in dplyr
We could use nest_by
and create the model columns in mutate
, then ungroup
to remove the rowwise
attributes created by nest_by
, loop over the 'model' and 'data' columns with pmap
, extract the columns as in the order of select
ion, i.e. ..1
-> data, ..2
-> model1 and ..3
-> model3. Create the new "Pred" columns in the 'data' (..1
), remove the model
columns in select
and unest
the 'data'
library(dplyr)
library(purrr)
library(tidyr)
gapminder %>%
nest_by(continent) %>%
mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data ))) %>%
ungroup %>%
mutate(data = pmap(select(., data, model1, model2),
~ ..1 %>%
mutate(Pred1 = predict(..2, ..1), Pred2 = predict(..3, ..1)))) %>%
select(-model1, -model2) %>%
unnest(c(data))
# A tibble: 1,704 x 8
# continent country year lifeExp pop gdpPercap Pred1 Pred2
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl> <dbl>
# 1 Africa Algeria 1952 43.1 9279525 2449. 48.8 49.2
# 2 Africa Algeria 1957 45.7 10270856 3014. 48.9 50.0
# 3 Africa Algeria 1962 48.3 11000948 2551. 48.9 49.4
# 4 Africa Algeria 1967 51.4 12760499 3247. 49.1 50.5
# 5 Africa Algeria 1972 54.5 14760787 4183. 49.2 52.0
# 6 Africa Algeria 1977 58.0 17152804 4910. 49.4 53.2
# 7 Africa Algeria 1982 61.4 20033753 5745. 49.6 54.6
# 8 Africa Algeria 1987 65.8 23254956 5681. 49.8 54.7
# 9 Africa Algeria 1992 67.7 26298373 5023. 50.0 54.0
#10 Africa Algeria 1997 69.2 29072015 4797. 50.2 53.9
# … with 1,694 more rows
Or without using the pmap
, we can create new columns with across
and mutate
, then unnest
gapminder %>%
nest_by(continent) %>%
mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data )),
across(starts_with('model'), ~ list(Predict = predict(., data)),
.names = "{.col}_Predict")) %>%
select(-model1, -model2) %>%
ungroup %>%
unnest(c(data, model1_Predict, model2_Predict))
-output
# A tibble: 1,704 x 8
# continent country year lifeExp pop gdpPercap model1_Predict model2_Predict
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl> <dbl>
# 1 Africa Algeria 1952 43.1 9279525 2449. 48.8 49.2
# 2 Africa Algeria 1957 45.7 10270856 3014. 48.9 50.0
# 3 Africa Algeria 1962 48.3 11000948 2551. 48.9 49.4
# 4 Africa Algeria 1967 51.4 12760499 3247. 49.1 50.5
# 5 Africa Algeria 1972 54.5 14760787 4183. 49.2 52.0
# 6 Africa Algeria 1977 58.0 17152804 4910. 49.4 53.2
# 7 Africa Algeria 1982 61.4 20033753 5745. 49.6 54.6
# 8 Africa Algeria 1987 65.8 23254956 5681. 49.8 54.7
# 9 Africa Algeria 1992 67.7 26298373 5023. 50.0 54.0
#10 Africa Algeria 1997 69.2 29072015 4797. 50.2 53.9
# … with 1,694 more rows
Creating a linear regression model for each group in a column
You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.
For example, in the linear_model
function you defined, if you were to use it alone you would write:
linear_model(data)
However, because you are using it inside the lapply
function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model
function to each of the data frames you obtain from split(table2,table2$LOCATION)
.
The same thing happens with my_predict
.
Anyway, this should work for you:
linear_model <- function(x) lm(Education ~ TIME, x)
m <- lapply(split(table2,table2$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2019))
my_predict <- function(x) predict(x,new_df)
sapply(m,my_predict)
ANSWER TO THE EDIT
There are probably more efficient ways of looping the prediction, but here is my approach:
pred_data <- list()
for (i in 3:6){
linear_model <- function(x) lm(x[,i] ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
my_predict <- function(x) predict(x,new_df)
pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
}
pred_data <- melt(pred_data)
pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))
First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4)
you put the interval of columns you want a prediction from. The result pred_data
is a list that you can transform into a data frame in different ways. With melt
and pivot_wider
you obtain a format similar to your original data.
Related Topics
How to Plot Multiple Residuals Plots in a Loop
Ggplot Aes_String Does Not Work Inside a Function
How to Manually Set Colors in a Bar Chart
Azure Put Blob Authentication Fails in R
Writing R Function with If Enviornment
Transforming Dataset into Value Matrix
Ggplot2: Issues with Dual Y-Axes and Loess Smoothing
Correct Positioning of Multiple Significance Labels on Dodged Groups in Ggplot
Subset Dataframe Such That All Values in Each Row Are Less Than a Certain Value
Define All Functions in One .R File, Call Them from Another .R File. How, If Possible
Figures Captions and Labels in Knitr
What Is a Fast Way to Set Debugging Code at a Given Line in a Function
Building a List in a Loop in R - Getting Item Names Correct
Most Mature Sparse Matrix Package for R