R: Multiple Linear Regression Model and Prediction Model

Calculating predictions for multiple linear regression

Instead of redefining the factors, just use the factor level in quotation marks in predict.

predict(m2, list(age=40, sex="male", bmi=30, children=2, smoker="yes", 
                 region="northwest"), int="p", level=0.98)
#         fit       lwr      upr
# 1 -1.978994 -9.368242 5.410254

Data

dat <- structure(list(charges = c(1.37095844714667, -0.564698171396089, 
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484, 
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), age = c(20L, 58L, 44L, 53L, 22L, 51L, 20L, 75L, 59L, 41L), 
    sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", 
    "male"), class = "factor"), bmi = c(25.3024309248682, 24.6058854935878, 
    25.7881406228236, 25.6707038267505, 24.0508191903124, 25.036135738485, 
    27.115755613237, 25.1674409043556, 24.1201634714689, 25.9469131749433
    ), children = c(4L, 1L, 5L, 1L, 1L, 4L, 0L, 0L, 3L, 4L), 
    smoker = c("no", "yes", "yes", "no", "no", "yes", "yes", 
    "yes", "yes", "no"), region = structure(c(1L, 2L, 2L, 3L, 
    1L, 2L, 3L, 3L, 3L, 2L), .Label = c("northeast", "northwest", 
    "southeast"), class = "factor")), row.names = c(NA, -10L), class = "data.frame")

R: multiple linear regression model and prediction model

I am putting everything from the comments into this answer.

1) You can use predict rather than predict.lm as predict will know your input is of class lm and do the right thing automatically.

2 The newdataset should be a data.frame with the same variables as your original predictors - in this case alt and sdist.

3) If you are bringing in you data using read.table by default it will create a data.frame. This assumes that the new data has columns named alt and sdist Then you can do:

NewDataSet<-read.table(whatever)
NewPredictions<- predict(model1, newdata=NewDatSet)

4) After you have done this if you want to check the predictions - you can do the following

summary(model1)

This will give you the intercept and the coefficients for alt and sdist
NewDataSet[1,]
This should give you the alt and sdist values for the first row, you can change the 1 in the bracket to be any row you want. Then use the information from summary(model1) to calculate what the predicted value should be using any method that you trust.

Finally use
NewPredictions[1]
to get what predict() gave you for the first row (or change the 1 to any other row)

Hopefully that should all work out.

r function: multiple linear regression prediction estimate and interval (user-defined function)

Update:
I found the problem, it was at t value t.alpha.demi<- qt(0.975, df=n-2) which explains why it didn't have the difference with single linear regression but did with multiple.

I changed it to t.alpha.demi<- qt(0.975, df=n-length(beta))

It was a mistake on my end. Regards,
Cyril S

Multiple linear model prediction in dplyr

We could use nest_by and create the model columns in mutate, then ungroup to remove the rowwise attributes created by nest_by, loop over the 'model' and 'data' columns with pmap, extract the columns as in the order of selection, i.e. ..1 -> data, ..2 -> model1 and ..3-> model3. Create the new "Pred" columns in the 'data' (..1), remove the model columns in select and unest the 'data'

library(dplyr)
library(purrr)
library(tidyr)
gapminder %>%
     nest_by(continent)  %>% 
     mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
            model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data ))) %>% 
     ungroup %>% 
     mutate(data = pmap(select(., data, model1, model2),  
          ~ ..1 %>%
              mutate(Pred1 = predict(..2, ..1), Pred2 = predict(..3, ..1)))) %>%
    select(-model1, -model2) %>%
    unnest(c(data))
# A tibble: 1,704 x 8
#   continent country  year lifeExp      pop gdpPercap Pred1 Pred2
#   <fct>     <fct>   <int>   <dbl>    <int>     <dbl> <dbl> <dbl>
# 1 Africa    Algeria  1952    43.1  9279525     2449.  48.8  49.2
# 2 Africa    Algeria  1957    45.7 10270856     3014.  48.9  50.0
# 3 Africa    Algeria  1962    48.3 11000948     2551.  48.9  49.4
# 4 Africa    Algeria  1967    51.4 12760499     3247.  49.1  50.5
# 5 Africa    Algeria  1972    54.5 14760787     4183.  49.2  52.0
# 6 Africa    Algeria  1977    58.0 17152804     4910.  49.4  53.2
# 7 Africa    Algeria  1982    61.4 20033753     5745.  49.6  54.6
# 8 Africa    Algeria  1987    65.8 23254956     5681.  49.8  54.7
# 9 Africa    Algeria  1992    67.7 26298373     5023.  50.0  54.0
#10 Africa    Algeria  1997    69.2 29072015     4797.  50.2  53.9
# … with 1,694 more rows

Or without using the pmap, we can create new columns with across and mutate, then unnest

gapminder %>%
     nest_by(continent) %>% 
     mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
            model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data )),
            across(starts_with('model'),  ~ list(Predict = predict(., data)),
             .names = "{.col}_Predict")) %>% 
     select(-model1, -model2)  %>%
     ungroup %>% 
     unnest(c(data, model1_Predict, model2_Predict))

-output

# A tibble: 1,704 x 8
#   continent country  year lifeExp      pop gdpPercap model1_Predict model2_Predict
#   <fct>     <fct>   <int>   <dbl>    <int>     <dbl>          <dbl>          <dbl>
# 1 Africa    Algeria  1952    43.1  9279525     2449.           48.8           49.2
# 2 Africa    Algeria  1957    45.7 10270856     3014.           48.9           50.0
# 3 Africa    Algeria  1962    48.3 11000948     2551.           48.9           49.4
# 4 Africa    Algeria  1967    51.4 12760499     3247.           49.1           50.5
# 5 Africa    Algeria  1972    54.5 14760787     4183.           49.2           52.0
# 6 Africa    Algeria  1977    58.0 17152804     4910.           49.4           53.2
# 7 Africa    Algeria  1982    61.4 20033753     5745.           49.6           54.6
# 8 Africa    Algeria  1987    65.8 23254956     5681.           49.8           54.7
# 9 Africa    Algeria  1992    67.7 26298373     5023.           50.0           54.0
#10 Africa    Algeria  1997    69.2 29072015     4797.           50.2           53.9
# … with 1,694 more rows

Creating a linear regression model for each group in a column

You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.

For example, in the linear_model function you defined, if you were to use it alone you would write:

linear_model(data)

However, because you are using it inside the lapply function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model function to each of the data frames you obtain from split(table2,table2$LOCATION).

The same thing happens with my_predict.

Anyway, this should work for you:

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)

ANSWER TO THE EDIT

There are probably more efficient ways of looping the prediction, but here is my approach:

pred_data <- list()

for (i in 3:6){
   linear_model <- function(x) lm(x[,i] ~ TIME, x)
   m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
   new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
   my_predict <- function(x) predict(x,new_df)
   pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
 }

 pred_data <- melt(pred_data)
 pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))

First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4) you put the interval of columns you want a prediction from. The result pred_data is a list that you can transform into a data frame in different ways. With melt and pivot_wider you obtain a format similar to your original data.