Regression

Robust Standard Errors: When to Use Them (and When to Use by Default)

A practical guide to heteroscedasticity-robust and cluster-robust standard errors. Learn when standard errors are wrong, which corrections to apply, and whether to use robust standard errors by default.

Share

Quick Hits

  • Standard SEs assume constant variance - heteroscedasticity makes them wrong
  • Robust SEs (HC0-HC3) are valid even when variance isn't constant
  • HC3 is the default choice for most applications
  • Clustering requires cluster-robust SEs - heteroscedasticity-robust isn't enough
  • Many researchers now use robust SEs by default as 'insurance'

TL;DR

Standard regression assumes constant variance of errors (homoscedasticity). When this fails, standard errors are wrong—usually too small, leading to too many false positives. Robust standard errors fix this without changing your coefficient estimates. HC3 is the recommended variant for heteroscedasticity. For clustered data, you need cluster-robust standard errors. Many researchers now advocate using robust SEs by default.


The Problem: When Standard Errors Go Wrong

What Standard OLS Assumes

Standard OLS regression assumes: $$\text{Var}(\epsilon_i) = \sigma^2 \quad \text{(constant for all } i \text{)}$$

This is homoscedasticity—equal variance.

What Heteroscedasticity Does

When variance isn't constant (heteroscedasticity):

  • Coefficient estimates are still unbiased
  • But standard errors are wrong
  • Usually too small → inflated Type I error
  • Confidence intervals have wrong coverage

Common Causes

  1. Scale effects: Variance increases with fitted values (e.g., revenue models)
  2. Group differences: Different subpopulations have different variances
  3. Model misspecification: Missing variables or wrong functional form
  4. Heavy tails: Outliers create heteroscedasticity-like effects

The Solution: Robust Standard Errors

The Idea

Instead of assuming constant variance, estimate the variance-covariance matrix of coefficients directly from the data, allowing each observation to have its own variance.

The Sandwich Estimator

Called "sandwich" because of its form:

$$\text{Var}(\hat{\beta}) = (X'X)^{-1} X' \hat{\Omega} X (X'X)^{-1}$$

The middle part ($X' \hat{\Omega} X$) is the "meat" of the sandwich, where $\hat{\Omega}$ is a diagonal matrix of squared residuals.


The HC Family: Which to Use?

HC0 (White's Original)

$$\hat{\Omega}_{ii} = \hat{e}_i^2$$

Simply uses squared residuals. Tends to be too liberal (underestimates SEs) in small samples.

HC1 (Degrees of Freedom Adjustment)

$$\hat{\Omega}_{ii} = \frac{n}{n-k} \hat{e}_i^2$$

Adjusts for the number of parameters. Better than HC0, but still often too liberal.

HC2 (Leverage Adjustment)

$$\hat{\Omega}_{ii} = \frac{\hat{e}i^2}{1-h{ii}}$$

Where $h_{ii}$ is the leverage of observation $i$. Accounts for high-leverage points.

HC3 (MacKinnon-White)

$$\hat{\Omega}_{ii} = \frac{\hat{e}i^2}{(1-h{ii})^2}$$

More aggressive adjustment for leverage. Most conservative, generally recommended.

Recommendation

Use HC3 for most applications:

  • Most robust in small samples
  • Good Type I error control
  • Standard in economics and social sciences

Code: Robust Standard Errors

Python

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats


def compare_standard_errors(model):
    """
    Compare standard and robust standard errors.

    Parameters:
    -----------
    model : statsmodels RegressionResults
        Fitted OLS model

    Returns:
    --------
    DataFrame comparing SE types
    """
    # Standard SEs
    std_se = model.bse

    # Robust SEs (HC0-HC3)
    hc0_se = model.get_robustcov_results(cov_type='HC0').bse
    hc1_se = model.get_robustcov_results(cov_type='HC1').bse
    hc2_se = model.get_robustcov_results(cov_type='HC2').bse
    hc3_se = model.get_robustcov_results(cov_type='HC3').bse

    comparison = pd.DataFrame({
        'Variable': model.params.index,
        'Coefficient': model.params.values,
        'Standard SE': std_se.values,
        'HC0': hc0_se.values,
        'HC1': hc1_se.values,
        'HC2': hc2_se.values,
        'HC3': hc3_se.values
    })

    # Calculate ratio (robust / standard)
    comparison['HC3/Std Ratio'] = comparison['HC3'] / comparison['Standard SE']

    return comparison


def robust_regression_summary(model, cov_type='HC3'):
    """
    Get regression summary with robust SEs.
    """
    robust_model = model.get_robustcov_results(cov_type=cov_type)

    summary = pd.DataFrame({
        'Variable': robust_model.params.index,
        'Coefficient': robust_model.params.values,
        'Robust SE': robust_model.bse.values,
        't-statistic': robust_model.tvalues.values,
        'p-value': robust_model.pvalues.values,
        'CI Lower': robust_model.conf_int()[0].values,
        'CI Upper': robust_model.conf_int()[1].values
    })

    return summary, robust_model


def detect_heteroscedasticity(model):
    """
    Test for heteroscedasticity using Breusch-Pagan and White tests.
    """
    from statsmodels.stats.diagnostic import het_breuschpagan, het_white

    residuals = model.resid
    exog = model.model.exog

    # Breusch-Pagan
    bp_stat, bp_p, _, _ = het_breuschpagan(residuals, exog)

    # White test
    try:
        white_stat, white_p, _, _ = het_white(residuals, exog)
    except:
        white_stat, white_p = np.nan, np.nan

    return {
        'breusch_pagan': {'statistic': bp_stat, 'p_value': bp_p},
        'white': {'statistic': white_stat, 'p_value': white_p},
        'heteroscedasticity_detected': bp_p < 0.05 or white_p < 0.05
    }


# Example
if __name__ == "__main__":
    np.random.seed(42)
    n = 200

    # Create heteroscedastic data
    x = np.random.uniform(1, 10, n)
    # Variance increases with x
    y = 2 + 3*x + np.random.normal(0, 1 + 0.5*x, n)

    data = pd.DataFrame({'x': x, 'y': y})

    # Fit model
    model = smf.ols('y ~ x', data=data).fit()

    print("Heteroscedasticity Tests")
    print("=" * 50)
    het_results = detect_heteroscedasticity(model)
    print(f"Breusch-Pagan p-value: {het_results['breusch_pagan']['p_value']:.4f}")
    print(f"Heteroscedasticity detected: {het_results['heteroscedasticity_detected']}")

    print("\n" + "=" * 50)
    print("Standard Error Comparison")
    comparison = compare_standard_errors(model)
    print(comparison.to_string(index=False))

    print("\n" + "=" * 50)
    print("Regression with HC3 Robust SEs")
    robust_summary, robust_model = robust_regression_summary(model)
    print(robust_summary.to_string(index=False))

R

library(tidyverse)
library(sandwich)  # For robust SEs
library(lmtest)    # For coeftest


compare_standard_errors <- function(model) {
    #' Compare standard and robust standard errors

    # Standard SEs
    std_se <- summary(model)$coefficients[, "Std. Error"]

    # Robust SEs
    hc0_se <- sqrt(diag(vcovHC(model, type = "HC0")))
    hc1_se <- sqrt(diag(vcovHC(model, type = "HC1")))
    hc2_se <- sqrt(diag(vcovHC(model, type = "HC2")))
    hc3_se <- sqrt(diag(vcovHC(model, type = "HC3")))

    tibble(
        Variable = names(coef(model)),
        Coefficient = coef(model),
        `Standard SE` = std_se,
        HC0 = hc0_se,
        HC1 = hc1_se,
        HC2 = hc2_se,
        HC3 = hc3_se,
        `HC3/Std Ratio` = hc3_se / std_se
    )
}


robust_summary <- function(model, type = "HC3") {
    #' Get model summary with robust SEs

    coeftest(model, vcov = vcovHC(model, type = type))
}


robust_confint <- function(model, type = "HC3", level = 0.95) {
    #' Get confidence intervals with robust SEs

    robust_vcov <- vcovHC(model, type = type)
    confint(model, vcov. = robust_vcov, level = level)
}


detect_heteroscedasticity <- function(model) {
    #' Test for heteroscedasticity

    bp_test <- lmtest::bptest(model)
    # White test
    white_test <- lmtest::bptest(model, ~ fitted(model) + I(fitted(model)^2))

    list(
        breusch_pagan = list(statistic = bp_test$statistic,
                            p_value = bp_test$p.value),
        white = list(statistic = white_test$statistic,
                    p_value = white_test$p.value),
        heteroscedasticity_detected = bp_test$p.value < 0.05 | white_test$p.value < 0.05
    )
}


# Example
set.seed(42)
n <- 200

# Heteroscedastic data
x <- runif(n, 1, 10)
y <- 2 + 3*x + rnorm(n, 0, 1 + 0.5*x)  # Variance increases with x
data <- tibble(x = x, y = y)

# Fit model
model <- lm(y ~ x, data = data)

cat("Heteroscedasticity Tests\n")
cat(strrep("=", 50), "\n")
het <- detect_heteroscedasticity(model)
cat(sprintf("Breusch-Pagan p-value: %.4f\n", het$breusch_pagan$p_value))
cat(sprintf("Detected: %s\n", het$heteroscedasticity_detected))

cat("\n", strrep("=", 50), "\n")
cat("Standard Error Comparison\n")
print(compare_standard_errors(model))

cat("\n", strrep("=", 50), "\n")
cat("Robust Summary (HC3)\n")
print(robust_summary(model))

Cluster-Robust Standard Errors

When Heteroscedasticity-Robust Isn't Enough

When observations are clustered (students in classrooms, employees in companies, repeated measures per user), errors are correlated within clusters.

Problem: Even HC3 assumes independent observations. Clustered data violates this.

Solution: Cluster-robust standard errors account for within-cluster correlation.

When to Use Cluster-Robust SEs

  1. Hierarchical data: Individuals nested in groups
  2. Panel data: Same units observed over time
  3. Multi-site studies: Treatment in different locations
  4. A/B tests with user-day observations: Same users appear multiple times

Code: Cluster-Robust SEs

# Python with statsmodels
model = smf.ols('y ~ treatment', data=data).fit(
    cov_type='cluster',
    cov_kwds={'groups': data['user_id']}
)

# Or post-hoc
robust_model = model.get_robustcov_results(
    cov_type='cluster',
    groups=data['user_id']
)
# R with sandwich
library(sandwich)
library(lmtest)

# Cluster-robust SEs
cluster_robust <- vcovCL(model, cluster = ~ user_id)
coeftest(model, vcov = cluster_robust)

Level of Clustering

Cluster at the level of the treatment assignment or the highest level of correlation:

  • Treatment assigned to classrooms → cluster by classroom
  • Treatment assigned to users → cluster by user
  • When in doubt, cluster at a higher level (more conservative)

The Case for Default Robust SEs

Arguments For

  1. Insurance policy: If homoscedasticity holds, you lose a little efficiency (~5%). If it doesn't, you avoid invalid inference.

  2. Robustness: Your conclusions don't depend on an assumption that's hard to verify.

  3. Standard practice: Many fields (economics, political science) now use robust SEs by default.

  4. Model agnosticism: Works regardless of functional form misspecification.

Arguments Against

  1. Efficiency loss: When homoscedasticity holds, standard SEs are more efficient.

  2. Small samples: Robust SEs can be unreliable with small n (<50).

  3. Model checking: Using robust SEs might mask model problems worth investigating.

  4. Doesn't fix everything: Won't help with non-independence (need clustering).

Practical Recommendation

Use HC3 by default when:

  • Sample size is moderate to large (n > 50)
  • You have any doubt about constant variance
  • You want results robust to assumptions

Check homoscedasticity when:

  • Sample is small
  • Efficiency matters (tight budget, marginal power)
  • You want to understand your data better

When Robust SEs Aren't Enough

Problem: Wrong Model

Robust SEs fix inference, not estimation. If your model is wrong (wrong functional form, omitted variables), coefficients are biased—robust SEs won't help.

Problem: Clustered Data

Heteroscedasticity-robust SEs assume independence. Clustered data needs cluster-robust SEs.

Problem: Small Samples

All robust SE methods rely on large-sample theory. With n < 30-50:

  • HC3 is more reliable than HC0
  • Consider wild bootstrap for better small-sample inference
  • Mixed effects models might be better for clustered data

Problem: Many Clusters, Few Observations per Cluster

Cluster-robust SEs need enough clusters (usually > 30-50) for reliable inference. With few clusters:

  • Consider mixed effects models
  • Use wild cluster bootstrap
  • Report cautiously

Summary: Which SE to Use When

Situation Recommended SE
Standard regression, large n HC3 or standard
Suspected heteroscedasticity HC3
Clustered data, many clusters Cluster-robust
Clustered data, few clusters Wild cluster bootstrap
Panel data Two-way clustering or panel methods
Small sample (n < 50) HC3 with caution, consider bootstrap

Reporting Guidelines

When using robust SEs:

  1. State which correction: "We report HC3 heteroscedasticity-robust standard errors"

  2. Justify clustering level: "Standard errors clustered at the user level to account for repeated observations"

  3. Compare when relevant: "Results are similar with standard SEs (see Appendix)"

  4. Don't over-interpret: Robust SEs protect inference but don't validate the model



Key Takeaway

Standard regression assumes constant error variance. When this fails, standard errors are wrong—usually too small, causing too many false positives. Robust standard errors (HC3 recommended) fix inference without changing coefficients. For clustered data, heteroscedasticity-robust isn't enough—you need cluster-robust SEs. Many researchers now use robust SEs by default: the efficiency loss when they're unnecessary is small, but the cost of wrong inference when they're needed is large.


References

  1. https://doi.org/10.1016/j.jeconom.2004.02.005
  2. https://www.jstatsoft.org/article/view/v095i01
  3. https://arxiv.org/abs/1903.03949
  4. MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. *Journal of Econometrics*, 29(3), 305-325.
  5. Zeileis, A., Köll, S., & Graham, N. (2020). Various versatile variances: An object-oriented implementation of clustered covariances in R. *Journal of Statistical Software*, 95(1), 1-36.
  6. Cameron, A. C., & Miller, D. L. (2015). A practitioner's guide to cluster-robust inference. *Journal of Human Resources*, 50(2), 317-372.

Frequently Asked Questions

Do robust standard errors change my coefficients?
No. Robust SEs only affect the standard errors (and thus p-values and CIs). The point estimates remain exactly the same. You're correcting inference, not estimation.
Should I use robust standard errors by default?
Many statisticians recommend this as an insurance policy. If homoscedasticity holds, you lose a little efficiency. If it doesn't hold, you avoid invalid inference. The cost of being wrong about homoscedasticity is higher than the cost of robust SEs when homoscedasticity holds.
What's the difference between HC0, HC1, HC2, and HC3?
They're different finite-sample corrections. HC0 is the original, often too liberal. HC1 adjusts degrees of freedom. HC2 and HC3 make further adjustments for leverage. HC3 is most conservative and recommended for general use. With large n, they converge.

Key Takeaway

Robust standard errors protect your inference when the constant variance assumption fails. HC3 is the default choice for heteroscedasticity; cluster-robust SEs are needed for clustered data. Consider using robust SEs by default as insurance—the cost when you don't need them is small, but the cost of wrong inference is large.

Send to a friend

Share this with someone who loves clean statistical work.