Double Clustered Standard Errors for Panel Data

Double clustered standard errors for panel data

Frank Harrell's package rms (which used to be named Design) has a function that I use often when clustering: robcov.

See this part of ?robcov, for example.

cluster: a variable indicating groupings. ‘cluster’ may be any type of
vector (factor, character, integer). NAs are not allowed.
Unique values of ‘cluster’ indicate possibly correlated
groupings of observations. Note the data used in the fit and
stored in ‘fit$x’ and ‘fit$y’ may have had observations
containing missing values deleted. It is assumed that if any
NAs were removed during the original model fitting, an
‘naresid’ function exists to restore NAs so that the rows of
the score matrix coincide with ‘cluster’. If ‘cluster’ is
omitted, it defaults to the integers 1,2,...,n to obtain the
"sandwich" robust covariance matrix estimate.

Higher level cluster standard errors for panel data

I wrote a package (clubTamal) when I encountered the same issue. clubTamal transform a plm object (by re-estimation) into an lm object in order to be able to cluster standard errors using teh multiwayvcov package. You find a Rpubs example and documentation here: https://rpubs.com/eliascis/clubTamal.

The package works for plm estimations with Fixed-Effects (model='within') or First-Difference models (model='fd').

To obtain a clustered covariance matrix use the vcovTamal command.

The package is still under development but can be installed directly from github:

library(devtools)
install_github("eliascis/clubTamal")

Unfortunalty the link to your example data does not work, but clubTamal further installs spd4testing, which constructs a simulated Small Panel Data set for testing purposes.

## packages
library(foreign)
library(lmtest)
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library(plm)
library(multiwayvcov)
library(spd4testing)
library(clubTamal)

## simulated data
d <- spd4testing()

## formula
f <- formula(y ~ x + factor(year))

## standard estimation
e <- plm(formula = f, data = d, model = "fd")
summary(e)
#> Oneway (individual) effect First-Difference Model
#>
#> Call:
#> plm(formula = f, data = d, model = "fd")
#>
#> Unbalanced Panel: n = 6, T = 3-5, N = 26
#> Observations used in estimation: 20
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -250.333 -115.219 12.651 108.390 228.110
#>
#> Coefficients:
#> Estimate Std. Error t-value Pr(>|t|)
#> (Intercept) -80.937 199.405 -0.4059 0.69096
#> x 71.858 25.974 2.7666 0.01514 *
#> factor(year)2002 194.842 216.449 0.9002 0.38325
#> factor(year)2003 109.118 414.298 0.2634 0.79609
#> factor(year)2004 446.147 583.234 0.7650 0.45700
#> factor(year)2005 451.514 752.479 0.6000 0.55807
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 1078300
#> Residual Sum of Squares: 394270
#> R-Squared: 0.63435
#> Adj. R-Squared: 0.50376
#> F-statistic: 4.85757 on 5 and 14 DF, p-value: 0.0087377
e <- plm(formula = f, data = d, model = "within")
summary(e)
#> Oneway (individual) effect Within Model
#>
#> Call:
#> plm(formula = f, data = d, model = "within")
#>
#> Unbalanced Panel: n = 6, T = 3-5, N = 26
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -167.4294 -59.3741 -6.9404 73.7132 146.4199
#>
#> Coefficients:
#> Estimate Std. Error t-value Pr(>|t|)
#> x 72.362 23.434 3.0879 0.007501 **
#> factor(year)2002 113.786 77.276 1.4725 0.161569
#> factor(year)2003 -67.413 75.012 -0.8987 0.383013
#> factor(year)2004 200.420 83.649 2.3960 0.030062 *
#> factor(year)2005 127.170 81.030 1.5694 0.137400
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 441370
#> Residual Sum of Squares: 190660
#> R-Squared: 0.56803
#> Adj. R-Squared: 0.28005
#> F-statistic: 3.94491 on 5 and 15 DF, p-value: 0.017501

## clustering
# no clustering
v <- e$vcov
coeftest(e)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 23.434 3.0879 0.007501 **
#> factor(year)2002 113.786 77.276 1.4725 0.161569
#> factor(year)2003 -67.413 75.012 -0.8987 0.383013
#> factor(year)2004 200.420 83.649 2.3960 0.030062 *
#> factor(year)2005 127.170 81.030 1.5694 0.137400
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# clustering at id level with plm-package
v <- vcovHC(e, type = "HC1", cluster = "group", tol = 1 * 10^-20)
coeftest(e, v)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 24.586 2.9433 0.010070 *
#> factor(year)2002 113.786 76.548 1.4865 0.157870
#> factor(year)2003 -67.413 63.962 -1.0540 0.308585
#> factor(year)2004 200.420 61.464 3.2608 0.005266 **
#> factor(year)2005 127.170 67.310 1.8893 0.078338 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## clustering at group level with clubTamal
v <- vcovTamal(estimate = e, data = d, groupvar = "gid")
#> Error in vcovTamal(estimate = e, data = d, groupvar = "gid"): better use the very fast and powerful lfe::felm
coeftest(e, v)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 24.586 2.9433 0.010070 *
#> factor(year)2002 113.786 76.548 1.4865 0.157870
#> factor(year)2003 -67.413 63.962 -1.0540 0.308585
#> factor(year)2004 200.420 61.464 3.2608 0.005266 **
#> factor(year)2005 127.170 67.310 1.8893 0.078338 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Created on 2020-05-29 by the reprex package (v0.3.0)

R - fixed effect of panel data analysis and robust standard errors

Set cluster='group' if you want to cluster on the variable serving as the individual index (city in your example).

Set cluster='time' if you want to cluster on the variable serving as the time index (yearin your example).

You can cluster on the time index even for a fixed effects one-way individual model.

For clustering on both index variables, you cannot do that with plm::vcovHC. Look at vcovDC from the same packages which provides double clustering (DC = double clustering), e.g.,

coeftest(fem_city, vcovDC(fem_city)

Panel data regression: Robust standard errors

After some time playing around, it works for me and gives me:

                         Estimate  Std. Error t value  Pr(>|t|)    
(Intercept) 4.5099e-16 5.2381e-16 0.8610 0.389254
C.MCAP_SEC -5.9769e-07 1.2677e-07 -4.7149 2.425e-06 ***
C.Impact_change -5.3908e-04 7.5601e-05 -7.1306 1.014e-12 ***
C.Mom 3.7560e-04 3.3378e-03 0.1125 0.910406
C.BM -1.6438e-04 1.7368e-05 -9.4645 < 2.2e-16 ***
C.PD 6.2153e-02 3.8766e-02 1.6033 0.108885
C.CashGen -2.7876e-04 1.4031e-02 -0.0199 0.984149
C.NITA -8.1792e-02 3.2153e-02 -2.5438 0.010969 *
C.PE -6.6170e-06 4.0138e-06 -1.6485 0.099248 .
C.PEdummy 1.3143e-02 4.8864e-03 2.6897 0.007154 **
factor(DS_CODE)130324 -5.2497e-16 5.2683e-16 -0.9965 0.319028
factor(DS_CODE)130409 -4.0276e-16 5.2384e-16 -0.7689 0.441986
factor(DS_CODE)130775 -4.4113e-16 5.2424e-16 -0.8415 0.400089
...

This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:

str(dat)
'data.frame': 48251 obs. of 12 variables:
$ DS_CODE : chr "902172" "902172" "902172" "902172" ...
$ DNEW : num 2e+05 2e+05 2e+05 2e+05 2e+05 ...
$ MCAP_SEC : num 78122 71421 81907 80010 82462 ...
$ NITA : num 0.135 0.135 0.135 0.135 0.135 ...
$ CashGen : num 0.198 0.198 0.198 0.198 0.198 ...
$ BM : num 0.1074 0.1108 0.097 0.0968 0.0899 ...
$ PE : num 57 55.3 63.1 63.2 68 ...
$ PEdummy : num 0 0 0 0 0 0 0 0 0 0 ...
$ L1.retE1M : num -0.72492 0.13177 0.00122 0.07214 -0.07332 ...
$ Mom : num 0 0 0 0 0 ...
$ PD : num 5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ...
$ Impact_change: num 0 -10.59 -10.43 0.7 -6.97 ...

What does str(data) return for you?



Related Topics



Leave a reply



Submit