Is there a predict function for plm in R?
There are (at least) two methods in the package to produce estimates from plm objects:
-- fixef.plm: Extract the Fixed Effects
-- pmodel.response: A function to extract the model.response
It appears to me that the author(s) are not interested in providing estimates for the "random effects". It may be a matter of "if you don't know how to do it on your own, then we don't want to give you a sharp knife to cut yourself too deeply."
Predict on test data, using plm package in R, and calculate RMSE for test data
Regarding out-of-sample prediction with fixed effects models, it is not clear how data relating to fixed effects not in the original model are to be treated, e.g., data for an individual not contained in the orignal data set the model was estimated on. (This is rather a methodological question than a programming question).
The version 2.6-2 of plm
allows predict
for fixed effect models with the original data and with out-of-sample data (see ?predict.plm
).
Find below an example with 10 firms for model estimation and the data to be used for prediction contains a firm not contained in the original data set (besides that firm, there are also years not contained in the original model object but these are irrelevant here as it is a one-way individual model). It is unclear what the fixed effect of that out-of-sample firm would be. Hence, by default, no predicted value is given (NA
value). If argument na.fill
is set to TRUE
, the (weighted) mean of the fixed effects contained in the original model object is used as a best guess.
library(plm)
data("Grunfeld", package = "plm")
# fit a fixed effect model
fit.fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within")
# generate 55 new observations of three firms used for prediction:
# * firm 1 with years 1935:1964 (has out-of-sample years 1955:1964),
# * firm 2 with years 1935:1949 (all in sample),
# * firm 11 with years 1935:1944 (firm 11 is out-of-sample)
set.seed(42L)
new.value2 <- runif(55, min = min(Grunfeld$value), max = max(Grunfeld$value))
new.capital2 <- runif(55, min = min(Grunfeld$capital), max = max(Grunfeld$capital))
newdata <- data.frame(firm = c(rep(1, 30), rep(2, 15), rep(11, 10)),
year = c(1935:(1935+29), 1935:(1935+14), 1935:(1935+9)),
value = new.value2, capital = new.capital2)
# make pdata.frame
newdata.p <- pdata.frame(newdata, index = c("firm", "year"))
## predict from fixed effect model with new data as pdata.frame
predict(fit.fe, newdata = newdata.p) # has NA values for the 11'th firm
## set na.fill = TRUE to have the weighted mean used to for fixed effects -> no NA values
predict(fit.fe, newdata = newdata.p, na.fill = TRUE)
NB: When you input a plain data.frame as newdata
, it is not clear how the data related to the individuals and time periods, which is why the weighted mean of fixed effects from the original model object is used for all observations in newdata
and a warning is printed. For fixed effect model prediction, it is reasonable to assume the user can provide information (via a pdata.frame) how the data the user wants to use for prediction relates to the individual and time dimension of panel data.
How to index predict plm object in R
Your code is not unambiguous, thus check for names
which gives a boolean inside the brackets.
yy[names(yy) %in% "ARIZONA"]
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.42640094 -0.36662046 -0.27070381 -0.18091251 -0.14102111 -0.18021858
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# -0.14774000 -0.08398230 0.01383581 0.09852240 0.12731152 0.17116278
# ARIZONA ARIZONA ARIZONA ARIZONA ARIZONA
# 0.14950942 0.19194103 0.28735344 0.34586645 0.41209687
Predict out of sample on fixed effects model
Just delete intercept for model :
model <- plm(pcap ~ 0 + hwy + water, data = Produc, model = 'within')
predict(model, newdata = data.frame('hwy' = 1, 'water' = 1))
3.980911
Marginal effects plot of PLM
Edit 2022-08-20: The latest version of plm
on CRAN now includes a predict()
method for within models. In principle, the commands illustrated below using fixest
should now work with plm
as well.
In my experience, plm
models are kind of tricky to deal with, and many of the packages which specialize in “post-processing” fail to handle these objects properly.
One alternative would be to estimate your “within” model using the fixest
package and to plot the results using the marginaleffects
package. (Disclaimer: I am the marginaleffects
author.)
Note that many of the models estimated by plm
are officially supported and tested with marginaleffects
(e.g., random effects, Amemiya, Swaymy-Arora). However, this is not the case of this specific "within" model, which is even trickier than the others to support.
First, we estimate two models to show that the plm
and fixest
versions are equivalent:
library(plm)
library(fixest)
library(marginaleffects)
library(modelsummary)
data("EmplUK")
mod1 <- plm(
emp ~ wage * capital,
index = c("firm", "year"),
model = "within",
effect = "individual",
data = EmplUK)
mod2 <- feols(
emp ~ wage * capital | firm,
se = "standard",
data = EmplUK)
models <- list("PLM" = mod1, "FIXEST" = mod2)
modelsummary(models)
PLM | FIXEST | |
---|---|---|
wage | 0.000 | 0.000 |
(0.034) | (0.034) | |
capital | 2.014 | 2.014 |
(0.126) | (0.126) | |
wage × capital | -0.043 | -0.043 |
(0.004) | (0.004) | |
Num.Obs. | 1031 | 1031 |
R2 | 0.263 | 0.986 |
R2 Adj. | 0.145 | 0.984 |
R2 Within | 0.263 | |
R2 Within Adj. | 0.260 | |
AIC | 4253.9 | 4253.9 |
BIC | 4273.7 | 4273.7 |
RMSE | 1.90 | 1.90 |
Std.Errors | IID | |
FE: firm | X |
Related Topics
Generating Multiple Plots in Ggplot by Factor
Subset Rows According to a Range of Time
The Perils of Aligning Plots in Ggplot
Geom_Density to Match Geom_Histogram Binwitdh
Merge Dataframes on Matching A, B and *Closest* C
How to Handle Vectors Without Knowing the Type in Rcpp
Why Is Subsetting on a "Logical" Type Slower Than Subsetting on "Numeric" Type
Legend of a Raster Map with Categorical Data
Ternary Plot and Filled Contour
Knit One Markdown File to Two Output Files
Writing Data Frame to PDF Table
How to Output Text to the R Console in Color
How to Return 5 Topmost Values from Vector in R