Double clustered standard errors for panel data
Frank Harrell's package rms
(which used to be named Design
) has a function that I use often when clustering: robcov
.
See this part of ?robcov
, for example.
cluster: a variable indicating groupings. ‘cluster’ may be any type of
vector (factor, character, integer). NAs are not allowed.
Unique values of ‘cluster’ indicate possibly correlated
groupings of observations. Note the data used in the fit and
stored in ‘fit$x’ and ‘fit$y’ may have had observations
containing missing values deleted. It is assumed that if any
NAs were removed during the original model fitting, an
‘naresid’ function exists to restore NAs so that the rows of
the score matrix coincide with ‘cluster’. If ‘cluster’ is
omitted, it defaults to the integers 1,2,...,n to obtain the
"sandwich" robust covariance matrix estimate.
Higher level cluster standard errors for panel data
I wrote a package (clubTamal
) when I encountered the same issue. clubTamal
transform a plm object (by re-estimation) into an lm object in order to be able to cluster standard errors using teh multiwayvcov
package. You find a Rpubs example and documentation here: https://rpubs.com/eliascis/clubTamal.
The package works for plm
estimations with Fixed-Effects (model='within'
) or First-Difference models (model='fd'
).
To obtain a clustered covariance matrix use the vcovTamal
command.
The package is still under development but can be installed directly from github:
library(devtools)
install_github("eliascis/clubTamal")
Unfortunalty the link to your example data does not work, but clubTamal further installs spd4testing
, which constructs a simulated Small Panel Data set for testing purposes.
## packages
library(foreign)
library(lmtest)
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library(plm)
library(multiwayvcov)
library(spd4testing)
library(clubTamal)
## simulated data
d <- spd4testing()
## formula
f <- formula(y ~ x + factor(year))
## standard estimation
e <- plm(formula = f, data = d, model = "fd")
summary(e)
#> Oneway (individual) effect First-Difference Model
#>
#> Call:
#> plm(formula = f, data = d, model = "fd")
#>
#> Unbalanced Panel: n = 6, T = 3-5, N = 26
#> Observations used in estimation: 20
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -250.333 -115.219 12.651 108.390 228.110
#>
#> Coefficients:
#> Estimate Std. Error t-value Pr(>|t|)
#> (Intercept) -80.937 199.405 -0.4059 0.69096
#> x 71.858 25.974 2.7666 0.01514 *
#> factor(year)2002 194.842 216.449 0.9002 0.38325
#> factor(year)2003 109.118 414.298 0.2634 0.79609
#> factor(year)2004 446.147 583.234 0.7650 0.45700
#> factor(year)2005 451.514 752.479 0.6000 0.55807
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 1078300
#> Residual Sum of Squares: 394270
#> R-Squared: 0.63435
#> Adj. R-Squared: 0.50376
#> F-statistic: 4.85757 on 5 and 14 DF, p-value: 0.0087377
e <- plm(formula = f, data = d, model = "within")
summary(e)
#> Oneway (individual) effect Within Model
#>
#> Call:
#> plm(formula = f, data = d, model = "within")
#>
#> Unbalanced Panel: n = 6, T = 3-5, N = 26
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -167.4294 -59.3741 -6.9404 73.7132 146.4199
#>
#> Coefficients:
#> Estimate Std. Error t-value Pr(>|t|)
#> x 72.362 23.434 3.0879 0.007501 **
#> factor(year)2002 113.786 77.276 1.4725 0.161569
#> factor(year)2003 -67.413 75.012 -0.8987 0.383013
#> factor(year)2004 200.420 83.649 2.3960 0.030062 *
#> factor(year)2005 127.170 81.030 1.5694 0.137400
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 441370
#> Residual Sum of Squares: 190660
#> R-Squared: 0.56803
#> Adj. R-Squared: 0.28005
#> F-statistic: 3.94491 on 5 and 15 DF, p-value: 0.017501
## clustering
# no clustering
v <- e$vcov
coeftest(e)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 23.434 3.0879 0.007501 **
#> factor(year)2002 113.786 77.276 1.4725 0.161569
#> factor(year)2003 -67.413 75.012 -0.8987 0.383013
#> factor(year)2004 200.420 83.649 2.3960 0.030062 *
#> factor(year)2005 127.170 81.030 1.5694 0.137400
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# clustering at id level with plm-package
v <- vcovHC(e, type = "HC1", cluster = "group", tol = 1 * 10^-20)
coeftest(e, v)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 24.586 2.9433 0.010070 *
#> factor(year)2002 113.786 76.548 1.4865 0.157870
#> factor(year)2003 -67.413 63.962 -1.0540 0.308585
#> factor(year)2004 200.420 61.464 3.2608 0.005266 **
#> factor(year)2005 127.170 67.310 1.8893 0.078338 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## clustering at group level with clubTamal
v <- vcovTamal(estimate = e, data = d, groupvar = "gid")
#> Error in vcovTamal(estimate = e, data = d, groupvar = "gid"): better use the very fast and powerful lfe::felm
coeftest(e, v)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 72.362 24.586 2.9433 0.010070 *
#> factor(year)2002 113.786 76.548 1.4865 0.157870
#> factor(year)2003 -67.413 63.962 -1.0540 0.308585
#> factor(year)2004 200.420 61.464 3.2608 0.005266 **
#> factor(year)2005 127.170 67.310 1.8893 0.078338 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2020-05-29 by the reprex package (v0.3.0)
R - fixed effect of panel data analysis and robust standard errors
Set cluster='group'
if you want to cluster on the variable serving as the individual index (city
in your example).
Set cluster='time'
if you want to cluster on the variable serving as the time index (year
in your example).
You can cluster on the time index even for a fixed effects one-way individual model.
For clustering on both index variables, you cannot do that with plm::vcovHC
. Look at vcovDC
from the same packages which provides double clustering (DC = double clustering), e.g.,
coeftest(fem_city, vcovDC(fem_city)
Panel data regression: Robust standard errors
After some time playing around, it works for me and gives me:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5099e-16 5.2381e-16 0.8610 0.389254
C.MCAP_SEC -5.9769e-07 1.2677e-07 -4.7149 2.425e-06 ***
C.Impact_change -5.3908e-04 7.5601e-05 -7.1306 1.014e-12 ***
C.Mom 3.7560e-04 3.3378e-03 0.1125 0.910406
C.BM -1.6438e-04 1.7368e-05 -9.4645 < 2.2e-16 ***
C.PD 6.2153e-02 3.8766e-02 1.6033 0.108885
C.CashGen -2.7876e-04 1.4031e-02 -0.0199 0.984149
C.NITA -8.1792e-02 3.2153e-02 -2.5438 0.010969 *
C.PE -6.6170e-06 4.0138e-06 -1.6485 0.099248 .
C.PEdummy 1.3143e-02 4.8864e-03 2.6897 0.007154 **
factor(DS_CODE)130324 -5.2497e-16 5.2683e-16 -0.9965 0.319028
factor(DS_CODE)130409 -4.0276e-16 5.2384e-16 -0.7689 0.441986
factor(DS_CODE)130775 -4.4113e-16 5.2424e-16 -0.8415 0.400089
...
This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:
str(dat)
'data.frame': 48251 obs. of 12 variables:
$ DS_CODE : chr "902172" "902172" "902172" "902172" ...
$ DNEW : num 2e+05 2e+05 2e+05 2e+05 2e+05 ...
$ MCAP_SEC : num 78122 71421 81907 80010 82462 ...
$ NITA : num 0.135 0.135 0.135 0.135 0.135 ...
$ CashGen : num 0.198 0.198 0.198 0.198 0.198 ...
$ BM : num 0.1074 0.1108 0.097 0.0968 0.0899 ...
$ PE : num 57 55.3 63.1 63.2 68 ...
$ PEdummy : num 0 0 0 0 0 0 0 0 0 0 ...
$ L1.retE1M : num -0.72492 0.13177 0.00122 0.07214 -0.07332 ...
$ Mom : num 0 0 0 0 0 ...
$ PD : num 5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ...
$ Impact_change: num 0 -10.59 -10.43 0.7 -6.97 ...
What does str(data)
return for you?
Related Topics
"Adding Missing Grouping Variables" Message in Dplyr in R
Create Barplot from Data.Frame
Writing Data Frame to PDF Table
Represent Numeric Value with Typical Dollar Amount Format
Applying a Function to Each Row of a Data.Table
Handle Continuous Missing Values in Time-Series Data
Font Family Won't Change in Ggplot
Why Does Rm Inside a Function Not Delete Objects
How to Remove Rows of a Matrix by Row Name, Rather Than Numerical Index
Get the Index of the Values of One Vector in Another
Numbers as Column Names of Data Frames
Trouble Passing on an Argument to Function Within Own Function
Download All Files from a Folder on a Website
Calculating Sum of Previous 3 Rows in R Data.Table (By Grid-Square)