Clustered standard errors in R using plm (with fixed effects)
Stata uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. It is a multiplicative factor on the variance-covariance matrix, $c=\frac{G}{G-1} \cdot \frac{N-1}{N-K}$, where G is the number of groups, N is the number of observations, and K is the number of parameters. I think coeftest
only uses $c'=\frac{N-1}{N-K}$ since if I scale R's standard error by the square of the first term in c, I get something pretty close to Stata's standard error:
display 0.21136*(46/(46-1))^(.5)
.21369554
Here's how I would replicate what Stata is doing in R:
require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales, model = 'within', data = Cigar)
G <- length(unique(Cigar$state))
c <- G/(G - 1)
coeftest(model,c * vcovHC(model, type = "HC1", cluster = "group"))
This yields:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
sales -1.219563 0.213773 -5.70496 1.4319e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
which agrees with Stata's error of 0.2137726 and t-stat of -5.70.
This code is probably not ideal, since the number of states in the data may be different than the number of states in the regression, but I am too lazy to figure out how to get the right number of panels.
R - fixed effect of panel data analysis and robust standard errors
Set cluster='group'
if you want to cluster on the variable serving as the individual index (city
in your example).
Set cluster='time'
if you want to cluster on the variable serving as the time index (year
in your example).
You can cluster on the time index even for a fixed effects one-way individual model.
For clustering on both index variables, you cannot do that with plm::vcovHC
. Look at vcovDC
from the same packages which provides double clustering (DC = double clustering), e.g.,
coeftest(fem_city, vcovDC(fem_city)
Clustered standard errors different in plm vs lfe
The difference is in the degrees-of-freedom adjustment. This is the usual first guess when looking for differences in supposedly similar standard errors (see e.g., Different Robust Standard Errors of Logit Regression in Stata and R). Here, the problem can be illustrated when comparing the results from (1) plm
+vcovHC
, (2) felm
, (3) lm
+cluster.vcov
(from package multiwayvcov
).
First, I refit all models:
m1 <- plm(y ~ x, data = mydata, index = c("facX", "state"),
effect = "individual", model = "within")
m2 <- felm(y ~ x | facX | 0 | facX, data = mydata)
m3 <- lm(y ~ facX + x, data = mydata)
All lead to the same coefficient estimates. For m3
the fixed effects are explicitly reported while they are not for m1
and m2
. Hence, for m3
only the last coefficient is extracted with tail(..., 1)
.
all.equal(coef(m1), coef(m2))
## [1] TRUE
all.equal(coef(m1), tail(coef(m3), 1))
## [1] TRUE
The non-robust standard errors also agree.
se <- function(object) tail(sqrt(diag(object)), 1)
se(vcov(m1))
## x
## 0.07002696
se(vcov(m2))
## x
## 0.07002696
se(vcov(m3))
## x
## 0.07002696
And when comparing the clustered standard errors we can now show that felm
uses the degrees-of-freedom correction while plm
does not:
se(vcovHC(m1))
## x
## 0.06572423
m2$cse
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX))
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX, df_correction = FALSE))
## x
## 0.06572423
Fixed effect regression with clustered standard errors
You can set SIC
as dummy variable and cluster the standard errors at the firm level:
# The fixed effects model
model <- lm(TOTAL_COMP ~ AT + factor(SIC), data = COMBINED_DATA)
# The fixed effects model with cluster settings
library(estimatr)
Clu_robust <- lm_robust(TOTAL_COMP ~ AT + factor(SIC),
data = COMBINED_DATA,
clusters = GVKEY,
se_type = 'stata')
Related Topics
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
How to Manipulate Null Elements in a Nested List
Use a Factor Column in "By" and Do Not Drop Empty Factors
Ggplot2:How to Reduce the Width and the Space Between Bars with Geom_Bar
Linear Interpolate Missing Values in Time Series
Handling Latex Backslashes in Xtable
How to Increase Stack Space Overflow for Pandoc in R
Index Element from List in Rcpp
How to Count Occurrences Combinations in Data.Table in R
Have Lubridate Subtraction Return Only a Numeric Value
Combine Lists While Overriding Values with Same Name in R
Fill Area Between Two Lines, with High/Low and Dates
Saving a Data Frame as a Binary File
Cumulative Count of Unique Values in R
Error in Unserialize(Socklist[[N]]):Error Reading from Connection on Unix
How to Replace Outliers with the 5Th and 95Th Percentile Values in R