Clustered Standard Errors in R Using Plm (With Fixed Effects)

Clustered standard errors in R using plm (with fixed effects)

Stata uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. It is a multiplicative factor on the variance-covariance matrix, $c=\frac{G}{G-1} \cdot \frac{N-1}{N-K}$, where G is the number of groups, N is the number of observations, and K is the number of parameters. I think coeftest only uses $c'=\frac{N-1}{N-K}$ since if I scale R's standard error by the square of the first term in c, I get something pretty close to Stata's standard error:

display 0.21136*(46/(46-1))^(.5)
.21369554

Here's how I would replicate what Stata is doing in R:

require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales, model = 'within', data = Cigar)
G <- length(unique(Cigar$state))
c <- G/(G - 1)
coeftest(model,c * vcovHC(model, type = "HC1", cluster = "group"))

This yields:

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
sales -1.219563 0.213773 -5.70496 1.4319e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

which agrees with Stata's error of 0.2137726 and t-stat of -5.70.

This code is probably not ideal, since the number of states in the data may be different than the number of states in the regression, but I am too lazy to figure out how to get the right number of panels.

R - fixed effect of panel data analysis and robust standard errors

Set cluster='group' if you want to cluster on the variable serving as the individual index (city in your example).

Set cluster='time' if you want to cluster on the variable serving as the time index (yearin your example).

You can cluster on the time index even for a fixed effects one-way individual model.

For clustering on both index variables, you cannot do that with plm::vcovHC. Look at vcovDC from the same packages which provides double clustering (DC = double clustering), e.g.,

coeftest(fem_city, vcovDC(fem_city)

Clustered standard errors different in plm vs lfe

The difference is in the degrees-of-freedom adjustment. This is the usual first guess when looking for differences in supposedly similar standard errors (see e.g., Different Robust Standard Errors of Logit Regression in Stata and R). Here, the problem can be illustrated when comparing the results from (1) plm+vcovHC, (2) felm, (3) lm+cluster.vcov (from package multiwayvcov).

First, I refit all models:

m1 <- plm(y ~ x, data = mydata, index = c("facX", "state"),
effect = "individual", model = "within")
m2 <- felm(y ~ x | facX | 0 | facX, data = mydata)
m3 <- lm(y ~ facX + x, data = mydata)

All lead to the same coefficient estimates. For m3 the fixed effects are explicitly reported while they are not for m1 and m2. Hence, for m3 only the last coefficient is extracted with tail(..., 1).

all.equal(coef(m1), coef(m2))
## [1] TRUE
all.equal(coef(m1), tail(coef(m3), 1))
## [1] TRUE

The non-robust standard errors also agree.

se <- function(object) tail(sqrt(diag(object)), 1)
se(vcov(m1))
## x
## 0.07002696
se(vcov(m2))
## x
## 0.07002696
se(vcov(m3))
## x
## 0.07002696

And when comparing the clustered standard errors we can now show that felm uses the degrees-of-freedom correction while plm does not:

se(vcovHC(m1))
## x
## 0.06572423
m2$cse
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX))
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX, df_correction = FALSE))
## x
## 0.06572423

Fixed effect regression with clustered standard errors

You can set SIC as dummy variable and cluster the standard errors at the firm level:

# The fixed effects model
model <- lm(TOTAL_COMP ~ AT + factor(SIC), data = COMBINED_DATA)

# The fixed effects model with cluster settings
library(estimatr)
Clu_robust <- lm_robust(TOTAL_COMP ~ AT + factor(SIC),
data = COMBINED_DATA,
clusters = GVKEY,
se_type = 'stata')


Related Topics



Leave a reply



Submit