Fama MacBeth standard errors in R
The plm
package can estimate Fama-MacBeth regressions and SEs.
require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")
fpmg <- pmg(y~x, test, index=c("year","firmid")) ##Fama-MacBeth
> ##Fama-MacBeth
> coeftest(fpmg)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.031278 0.023356 1.3392 0.1806
x 1.035586 0.033342 31.0599 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However note that this method works only if your data can be coerced to a pdata.frame
. (It will fail if you have "duplicate couples (time-id)"
.)
For further details see:
- Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R
Newey-West standard errors with Mean Groups/Fama-MacBeth estimator
Currently this is impossible with plm
package.
However, you could just create them yourself.
Suppose you have:
fpmg <- pmg(y~x, test, index = c('year', 'firmid'))
fpmg.coefficients <- fpmg$coefficients
# (Intercept) x
# 0.03127797 1.03558610
coeftest(fpmg)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.031278 0.023356 1.3392 0.1806
# x 1.035586 0.033342 31.0599 <2e-16 ***
Then you can simply create the estimators yourself like in:
the.years <- unique(test$year)
a.formula <- y ~ x
first.step <- lapply(the.years, function(a.year) {
temp.data <- test[test$year == a.year, ]
an.lm <- lm(a.formula, data = temp.data)
the.coefficients <- an.lm$coef
the.results <- as.data.frame(cbind(a.year, t(the.coefficients)))
the.results
})
first.step.df <- do.call('rbind', first.step)
second.step.coefficients <- apply(first.step.df[, -1], 2, mean)
second.step.coefficients
# (Intercept) x
# 0.03127797 1.03558610
identical(fpmg.coefficients, second.step.coefficients)
# [1] TRUE
Check that they are identical both ways just in case.
Last, you can obtain the Newey-West (1987) with one lag adjusted t-statistics for the means with:
library(sandwich)
second.step.NW.sigma.sq <- apply(first.step.df[, -1], 2,
function(x) sqrt(NeweyWest(lm(x ~ 1),
lag = 1, prewhite = FALSE)['(Intercept)',
'(Intercept)']))
second.step.NW.sigma.sq
# (Intercept) x
# 0.02438398 0.02859447
t.statistics.NW.lag.1 <- second.step.coefficients / second.step.NW.sigma.sq
t.statistics.NW.lag.1
# (Intercept) x
# 1.282726 36.216301
Update
In my answer, I had only included the "manual" calculation of the t-statistic, because it is computationally faster.
A more generic solution is to calculcate the Newey-West corrected t-statistics and their p-values with the coeftest()
function of the lmtest
package.
coeftest(lm(first.step.df$'(Intercept)' ~ 1), vcov = NeweyWest(lm(first.step.df$'(Intercept)' ~ 1), lag = 1, prewhite = FALSE))
# t test of coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.031278 0.024384 1.2827 0.2316
coeftest(lm(first.step.df$x ~ 1), vcov = NeweyWest(lm(first.step.df$x ~ 1), lag = 1, prewhite = FALSE))
# t test of coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.035586 0.028594 36.216 4.619e-11 ***
Fama Macbeth Regression in Python (Pandas or Statsmodels)
An update to reflect the library situation for Fama-MacBeth as of Fall 2018. The fama_macbeth
function has been removed from pandas
for a while now. So what are your options?
If you're using python 3, then you can use the Fama-MacBeth method in LinearModels: https://github.com/bashtage/linearmodels/blob/master/linearmodels/panel/model.py
If you're using python 2 or just don't want to use LinearModels, then probably your best option is to roll you own.
For example, suppose you have the Fama-French industry portfolios in a panel like the following (you've also computed some variables like past beta or past returns to use as your x-variables):
In [1]: import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
In [4]: df = pd.read_csv('industry.csv',parse_dates=['caldt'])
df.query("caldt == '1995-07-01'")
In [5]: Out[5]:
industry caldt ret beta r12to2 r36to13
18432 Aero 1995-07-01 6.26 0.9696 0.2755 0.3466
18433 Agric 1995-07-01 3.37 1.0412 0.1260 0.0581
18434 Autos 1995-07-01 2.42 1.0274 0.0293 0.2902
18435 Banks 1995-07-01 4.82 1.4985 0.1659 0.2951
Fama-MacBeth primarily involves computing the same cross-sectional regression model month by month, so you can implement it using a groupby
. You can create a function that takes a dataframe
(it will come from the groupby
) and a patsy
formula; it then fits the model and returns the parameter estimates. Here is a barebones version of how you could implement it (note this is what the original questioner tried to do a few years ago ... not sure why it didn't work although it's possible back then statsmodels
result object method params
wasn't returning a pandas
Series
so the return needed to be converted to a Series
explicitly ... it does work fine in the current version of pandas
, 0.23.4):
def ols_coef(x,formula):
return smf.ols(formula,data=x).fit().params
In [9]: gamma = (df.groupby('caldt')
.apply(ols_coef,'ret ~ 1 + beta + r12to2 + r36to13'))
gamma.head()
In [10]: Out[10]:
Intercept beta r12to2 r36to13
caldt
1963-07-01 -1.497012 -0.765721 4.379128 -1.918083
1963-08-01 11.144169 -6.506291 5.961584 -2.598048
1963-09-01 -2.330966 -0.741550 10.508617 -4.377293
1963-10-01 0.441941 1.127567 5.478114 -2.057173
1963-11-01 3.380485 -4.792643 3.660940 -1.210426
Then just compute the mean, standard error on the mean, and a t-test (or whatever statistics you want). Something like the following:
def fm_summary(p):
s = p.describe().T
s['std_error'] = s['std']/np.sqrt(s['count'])
s['tstat'] = s['mean']/s['std_error']
return s[['mean','std_error','tstat']]
In [12]: fm_summary(gamma)
Out[12]:
mean std_error tstat
Intercept 0.754904 0.177291 4.258000
beta -0.012176 0.202629 -0.060092
r12to2 1.794548 0.356069 5.039896
r36to13 0.237873 0.186680 1.274230
Improving Speed
Using statsmodels
for the regressions has significant overhead (particularly given you only need the estimated coefficients). If you want better efficiency, then you could switch from statsmodels
to numpy.linalg.lstsq
. Write a new function that does the ols estimation ... something like the following (notice I'm not doing anything like checking the rank of these matrices ...):
def ols_np(data,yvar,xvar):
gamma,_,_,_ = np.linalg.lstsq(data[xvar],data[yvar],rcond=None)
return pd.Series(gamma)
And if you're still using an older version of pandas
, the following will work:
Here is an example of using the fama_macbeth
function in pandas
:
>>> df
y x
date id
2012-01-01 1 0.1 0.4
2 0.3 0.6
3 0.4 0.2
4 0.0 1.2
2012-02-01 1 0.2 0.7
2 0.4 0.5
3 0.2 0.1
4 0.1 0.0
2012-03-01 1 0.4 0.8
2 0.6 0.1
3 0.7 0.6
4 0.4 -0.1
Notice, the structure. The fama_macbeth
function expects the y-var and x-vars to have a multi-index with date as the first variable and the stock/firm/entity id as the second variable in the index:
>>> fm = pd.fama_macbeth(y=df['y'],x=df[['x']])
>>> fm
----------------------Summary of Fama-MacBeth Analysis-------------------------
Formula: Y ~ x + intercept
# betas : 3
----------------------Summary of Estimated Coefficients------------------------
Variable Beta Std Err t-stat CI 2.5% CI 97.5%
(x) -0.0227 0.1276 -0.18 -0.2728 0.2273
(intercept) 0.3531 0.0842 4.19 0.1881 0.5181
--------------------------------End of Summary---------------------------------
Note that just printing fm
calls fm.summary
>>> fm.summary
----------------------Summary of Fama-MacBeth Analysis-------------------------
Formula: Y ~ x + intercept
# betas : 3
----------------------Summary of Estimated Coefficients------------------------
Variable Beta Std Err t-stat CI 2.5% CI 97.5%
(x) -0.0227 0.1276 -0.18 -0.2728 0.2273
(intercept) 0.3531 0.0842 4.19 0.1881 0.5181
--------------------------------End of Summary---------------------------------
Also, note the fama_macbeth
function automatically adds an intercept (as opposed to statsmodels
routines). Also the x-var has to be a dataframe
so if you pass just one column you need to pass it as df[['x']]
.
If you don't want an intercept you have to do:
>>> fm = pd.fama_macbeth(y=df['y'],x=df[['x']],intercept=False)
How can I use Newey-West Standard Errors in modelplot(), in R?
The modelplot
documentation notes that the vcov
argument can be either a list of functions, or a list of matrices, or a vector of strings. This means that both of these commands should work:
library(modelsummary)
library(sandwich)
mod <- lm(mpg ~ hp, mtcars)
modelplot(mod, vcov = c("HC3", "NeweyWest"))
Alternatively,
modelplot(mod, vcov = list(vcovHC, NeweyWest))
modelplot(mod, vcov = list(vcovHC(mod), NeweyWest(mod)))
The error message you get suggests that the specific model you are trying to summarize is unsupported by the sandwich
package. You can check if it is supported by calling:
sandwich::NeweyWest(mod)
#> (Intercept) hp
#> (Intercept) 5.93124548 -0.032929737
#> hp -0.03292974 0.000221307
In my example, mod
was a lm
model, but you did not supply a MINIMAL REPRODUCIBLE EXAMPLE, so your problem is impossible to diagnose conclusively.
Related Topics
How to Jitter Two Ggplot Geoms in the Same Way
How to Convert .Rdata Format into Text File Format
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
Simple Lookup to Insert Values in an R Data Frame
String Split on Last Comma in R
How to Escape Characters in Variable Names
Handling Latex Backslashes in Xtable
Why Is the Class of a Vector the Class of the Elements of the Vector and Not Vector Itself
Extracting Coefficients and Their Standard Error for Each Unit in an Lme Model Fit
Custom Fill Color in Ggvis (And Other Options)
Can't Connect to Local MySQL Server Through Socket Error When Using Ssh Tunel
Combine Lists While Overriding Values with Same Name in R
How to Collapse Sidebarpanel in Shiny App
Data.Table Error When Used Through Knitr, Gwidgetswww
How to Rotate the Axis Labels in Ggplot2
Formatting Histogram X-Axis When Working with Dates Using R
Error in Unserialize(Socklist[[N]]):Error Reading from Connection on Unix