Regression with Heteroskedasticity Corrected Standard Errors
I think you are on the right track with coeftest
in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.
> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
(Intercept) x
(Intercept) 0.010809366 0.001209603
x 0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.01973 0.10397 9.8081 3.159e-16 ***
x 0.93992 0.13547 6.9381 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To get the F test, look at function waldtest()
:
> waldtest(fm, vcov = vcovHC)
Wald test
Model 1: y ~ x
Model 2: y ~ 1
Res.Df Df F Pr(>F)
1 98
2 99 -1 48.137 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You could always cook up a simple function to combine these two for you if you wanted the one-liner...
There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.
Edit: A one-liner could be as simple as:
mySummary <- function(model, VCOV) {
print(coeftest(model, vcov. = VCOV))
print(waldtest(model, vcov = VCOV))
}
Which we can use like this (on the examples from above):
> mySummary(fm, vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.01973 0.10397 9.8081 3.159e-16 ***
x 0.93992 0.13547 6.9381 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Wald test
Model 1: y ~ x
Model 2: y ~ 1
Res.Df Df F Pr(>F)
1 98
2 99 -1 48.137 4.313e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Getting statsmodels to use heteroskedasticity corrected standard errors in coefficient t-tests
The fit
method of the linear models, discrete models and GLM, take a cov_type
and a cov_kwds
argument for specifying robust covariance matrices. This will be attached to the results instance and used for all inference and statistics reported in the summary table.
Unfortunately, the documentation doesn't really show this yet in an appropriate way.
The auxiliary method that actually selects the sandwiches based on the options shows the options and required arguments:
http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.fit.html
For example, estimating an OLS model and using HC3
covariance matrices can be done with
model_ols = OLS(...)
result = model_ols.fit(cov_type='HC3')
result.bse
result.t_test(....)
Some sandwiches require additional arguments, for example cluster robust standard errors, can be selected in the following way, assuming mygroups
is an array that contains the groups labels:
results = OLS(...).fit(cov_type='cluster', cov_kwds={'groups': mygroups}
results.bse
...
Some robust covariance matrices make additional assumptions about the data without checking. For example heteroscedasticity and autocorrelation robust standard errors or Newey-West, HAC
, standard errors assume a sequential time series structure. Some panel data robust standard errors also assume stacking of the time series by individuals.
A separate option use_t
is available to specify whether the t and F or the normal and chisquare distributions should be used by default for Wald tests and confidence intervals.
How to cluster standard errors with small sample corrections in R
Using -...vcovHC(df, type="sss", cluster="study")- is a dated way to incorporate small sample corrections. Upon understanding differences between the sandwich estimators HC0-HC4, using the code previous to it:
coeftest(reg, vcov = vcovHC(reg, type="HC1")
is appropriate with the corresponding sandwich estimator in the type argument. The issue was with the dated syntax that followed and this is the correct format.
Related Topics
Rbuildignore and Excluding Directories
Control Speed of a Gganimation
Differencebetween Aes and Aes_String (Ggplot2) in R
Dplyr::Select() with Some Variables That May Not Exist in the Data Frame
Installing Package from a Local .Tar.Gz File on Linux
Tm_Map Has Parallel::Mclapply Error in R 3.0.1 on MAC
R Shiny Dt - Edit Values in Table with Reactive
Ggplot Bar Plot Side by Side Using Two Variables
How Is J() Function Implemented in Data.Table
Cast String Directly to Idatetime
How to Merge Two Nodes into a Single Node Using Igraph
Create Several Dummy Variables from One String Variable
Does White Space Slow Down Processing
Split or Separate Uneven/Unequal Strings with No Delimiter
Count Total Missing Values by Group