Predict Out of Sample on Fixed Effects Model

Predict out of sample on fixed effects model

Just delete intercept for model :

model <-  plm(pcap ~ 0 + hwy + water, data = Produc, model = 'within')
predict(model, newdata = data.frame('hwy' = 1, 'water' = 1))
3.980911

Predict on test data, using plm package in R, and calculate RMSE for test data

Regarding out-of-sample prediction with fixed effects models, it is not clear how data relating to fixed effects not in the original model are to be treated, e.g., data for an individual not contained in the orignal data set the model was estimated on. (This is rather a methodological question than a programming question).

The version 2.6-2 of plm allows predict for fixed effect models with the original data and with out-of-sample data (see ?predict.plm).

Find below an example with 10 firms for model estimation and the data to be used for prediction contains a firm not contained in the original data set (besides that firm, there are also years not contained in the original model object but these are irrelevant here as it is a one-way individual model). It is unclear what the fixed effect of that out-of-sample firm would be. Hence, by default, no predicted value is given (NA value). If argument na.fill is set to TRUE, the (weighted) mean of the fixed effects contained in the original model object is used as a best guess.

library(plm)
data("Grunfeld", package = "plm")

# fit a fixed effect model
fit.fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within")

# generate 55 new observations of three firms used for prediction:
# * firm 1 with years 1935:1964 (has out-of-sample years 1955:1964),
# * firm 2 with years 1935:1949 (all in sample),
# * firm 11 with years 1935:1944 (firm 11 is out-of-sample)
set.seed(42L)

new.value2 <- runif(55, min = min(Grunfeld$value), max = max(Grunfeld$value))
new.capital2 <- runif(55, min = min(Grunfeld$capital), max = max(Grunfeld$capital))

newdata <- data.frame(firm = c(rep(1, 30), rep(2, 15), rep(11, 10)),
year = c(1935:(1935+29), 1935:(1935+14), 1935:(1935+9)),
value = new.value2, capital = new.capital2)
# make pdata.frame
newdata.p <- pdata.frame(newdata, index = c("firm", "year"))

## predict from fixed effect model with new data as pdata.frame
predict(fit.fe, newdata = newdata.p) # has NA values for the 11'th firm

## set na.fill = TRUE to have the weighted mean used to for fixed effects -> no NA values
predict(fit.fe, newdata = newdata.p, na.fill = TRUE)

NB: When you input a plain data.frame as newdata, it is not clear how the data related to the individuals and time periods, which is why the weighted mean of fixed effects from the original model object is used for all observations in newdata and a warning is printed. For fixed effect model prediction, it is reasonable to assume the user can provide information (via a pdata.frame) how the data the user wants to use for prediction relates to the individual and time dimension of panel data.

Is there a predict function for plm in R?

There are (at least) two methods in the package to produce estimates from plm objects:

-- fixef.plm: Extract the Fixed Effects

-- pmodel.response: A function to extract the model.response

It appears to me that the author(s) are not interested in providing estimates for the "random effects". It may be a matter of "if you don't know how to do it on your own, then we don't want to give you a sharp knife to cut yourself too deeply."

How to calculate predicted means for specific fixed effects from model output using glmmTMB

You cannot "remove" a predictor from the predicted means (when you use predict()), because this would return just NA values. Thus, I would recommend the same as Allen and use a sensible / meaningful value at which you hold Logger.ID constant. Here's an example from the ggeffects package:

library(ggeffects)
library(glmmTMB)
data("Salamanders")
m <- glmmTMB(count ~ spp * mined + sample + (1 | site), family = nbinom1, data = Salamanders)

# hold "sample" constant at its mean value
ggpredict(m, c("spp", "mined"))
#>
#> # Predicted counts of count
#> # x = spp
#>
#> # mined = yes
#>
#> x | Predicted | SE | 95% CI
#> --------------------------------------
#> GP | 0.04 | 1.01 | [0.01, 0.27]
#> PR | 0.11 | 0.60 | [0.03, 0.36]
#> DM | 0.38 | 0.36 | [0.19, 0.78]
#> EC-A | 0.11 | 0.60 | [0.03, 0.36]
#> EC-L | 0.32 | 0.38 | [0.15, 0.68]
#> DF | 0.56 | 0.32 | [0.30, 1.04]
#>
#> # mined = no
#>
#> x | Predicted | SE | 95% CI
#> --------------------------------------
#> GP | 2.27 | 0.20 | [1.53, 3.37]
#> PR | 0.46 | 0.33 | [0.24, 0.88]
#> DM | 2.42 | 0.20 | [1.64, 3.58]
#> EC-A | 0.89 | 0.27 | [0.53, 1.50]
#> EC-L | 3.20 | 0.19 | [2.21, 4.65]
#> DF | 1.85 | 0.21 | [1.22, 2.81]
#>
#> Adjusted for:
#> * sample = 2.50
#> * site = NA (population-level)
#> Standard errors are on the link-scale (untransformed).

# predicted means when sample is set to "0"
ggpredict(m, c("spp", "mined"), condition = list(sample = 0))
#>
#> # Predicted counts of count
#> # x = spp
#>
#> # mined = yes
#>
#> x | Predicted | SE | 95% CI
#> --------------------------------------
#> GP | 0.04 | 1.02 | [0.00, 0.27]
#> PR | 0.11 | 0.62 | [0.03, 0.36]
#> DM | 0.38 | 0.38 | [0.18, 0.80]
#> EC-A | 0.11 | 0.61 | [0.03, 0.36]
#> EC-L | 0.32 | 0.40 | [0.15, 0.69]
#> DF | 0.54 | 0.34 | [0.28, 1.06]
#>
#> # mined = no
#>
#> x | Predicted | SE | 95% CI
#> --------------------------------------
#> GP | 2.22 | 0.24 | [1.40, 3.52]
#> PR | 0.45 | 0.36 | [0.22, 0.90]
#> DM | 2.37 | 0.24 | [1.49, 3.78]
#> EC-A | 0.87 | 0.30 | [0.49, 1.58]
#> EC-L | 3.14 | 0.22 | [2.04, 4.81]
#> DF | 1.81 | 0.25 | [1.11, 2.95]
#>
#> Adjusted for:
#> * site = NA (population-level)
#> Standard errors are on the link-scale (untransformed).

Created on 2020-09-14 by the reprex package (v0.3.0)

Fit a fixed effect model with lm() in R without individual intercepts

Let's assume some concrete data example, e.g.

a <- rnorm(100)
b <- runif(100)

train <- data.frame(a, b,
author = sample(LETTERS[1:10], 100, 1),
y = 3*a + .5*b + rnorm(100))

Now we do a fixed effect regression, I assume we do not want any Intercept so the command is

fit <- lm(y ~ a + b + author - 1, data = train)

The - 1 part in the formula leaves the Intercpet out, instead a fixed effect is computed for each author. No base level left out.

Printing this model or it's summary is feasible with the 10 authors in the example but not with thousands in your work.

You can print the coefficents of only a and b like this

> fit$coefficients[c('a', 'b')]
a b
3.02022335 0.09789947

You can see the coefficients and their p-values via the anova command

> anova(fit)

Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a 1 1139.90 1139.90 1034.2307 < 2.2e-16 ***
b 1 7.73 7.73 7.0127 0.009591 **
author 10 10.75 1.07 0.9751 0.470812
Residuals 88 96.99 1.10

And you can even deconstruct summary(fit) for the coefficents or to display the adjusted R²:

> summary(fit)$coefficients[c("a", "b"),]
Estimate Std. Error t value Pr(>|t|)
a 3.02022335 0.09697699 31.1437123 2.679195e-49
b 0.09789947 0.35161039 0.2784317 7.813342e-01
>
> summary(fit)$adj.r.squared
[1] 0.9122033

For other values see help(summary.lm). Should you still want to see the coefficient of author F that is

> fit$coefficients["authorF"]
authorF
0.6174314


Related Topics



Leave a reply



Submit