Skip to content
Heteroskedasticityintermediate

Heteroskedasticity

Heteroskedasticity means the error variance changes with the regressors, Var(ux1,,xk)=σ2(x)\mathrm{Var}(u\mid x_1,\dots,x_k)=\sigma^2(x) rather than a constant σ2\sigma^2. It violates the homoskedasticity assumption (MLR.5), the last piece of Gauss-Markov. The consequences are sharply bounded: OLS coefficients stay unbiased and consistent, and R2R^2 is unaffected, because none of those properties uses MLR.5. What breaks is inference: the usual standard error formula is wrong, so the reported tt statistics, FF statistics, and confidence intervals are invalid. This node is the hub of the violation, test, remedy chain for the variance assumption.

Try it yourself

Classical vs robust standard errors

The error spread grows with x (heteroskedasticity), yet the OLS slope stays unbiased. What breaks is the standard error: the classical SE is invalid here, while the robust (HC1) SE — Stata’s , robust — is asymptotically valid. The point estimate is fixed by construction, so only the SEs move.

SE(b₁)classical 0.158vsrobust 0.2141.36×
159141814710xyOLS fit (slope fixed by construction)
95% confidence interval for the slope b₁ (same centre, different width)
0.270.540.801.061.33Classical CIRobust (HC1) CIb₁ = 0.80
Slope b₁ (fixed by construction) 0.80Robust ÷ classical SE 1.36×
Heteroskedasticity6.0
The slope estimate is 0.80 at every setting, so heteroskedasticity has not biased it. But the classical SE (0.158) is invalid here, while the robust SE (0.214) is asymptotically valid. In this design robust is 1.36× the classical, so the classical confidence interval is too narrow and its t-test overstates significance.
Robust here is HC1 (finite-sample adjusted), the same estimator as Stata’s regress y x, robust. It is asymptotically valid, not exactly correct, and need not be larger in general — it simply happens to be larger in this rising-variance design.
The slope point estimate is 0.80, fixed by construction. The classical standard error is 0.158 and the heteroskedasticity-robust standard error is 0.214, a ratio of 1.36 to one. In this design the robust standard error is larger, so the classical confidence interval is too narrow.

Why it matters

Picture a scatter of household savings against income. At low incomes the points hug the regression line; at high incomes they fan out, because richer households have far more discretion over how much to save. The line through the cloud is still in the right place on average, so the slope estimate is fine. The problem is that OLS, told the spread is the same everywhere, miscounts how precise the slope is. It leans too hard on the noisy high-income points and reports a standard error that no longer matches reality, so every tt and FF test built on it is untrustworthy.

Formulas

Homoskedasticity (MLR.5), the assumption that fails
Var(ux1,,xk)=σ2\mathrm{Var}(u \mid x_1, \dots, x_k) = \sigma^2
Heteroskedasticity replaces the constant σ2\sigma^2 with σ2(x)\sigma^2(x), a variance that depends on the explanatory variables.
True slope variance (simple regression) vs the formula OLS reports
Var(β^1)=i=1n(xixˉ)2σi2[i=1n(xixˉ)2]2\mathrm{Var}(\hat{\beta}_1) = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2 \, \sigma_i^2}{\left[\sum_{i=1}^{n} (x_i - \bar{x})^2\right]^2}
Under homoskedasticity every σi2=σ2\sigma_i^2=\sigma^2 and this collapses to σ2/SSTx\sigma^2 / \mathrm{SST}_x, the usual textbook formula. When the σi2\sigma_i^2 differ, that simple formula is biased, so the reported SE is wrong.

Worked examples

Scenario

You regress household saving on income, family size, and age in Stata and want a first read on whether the error variance is constant before trusting the t statistics.

Solution

Run `regress saving inc size age`, then plot the squared residuals against fitted values with `rvfplot, yline(0)`. A funnel that widens with fitted saving is the visual signature of heteroskedasticity. Because OLS is still unbiased, the point estimates are usable, but you should not report the default standard errors until you have either tested the variance or switched to robust SEs.

Note`rvfplot` is a diagnostic, not a decision rule. Confirm with a formal test (Breusch-Pagan or White) before drawing conclusions.

Common mistakes

  • Heteroskedasticity biases the OLS coefficients. It does not. Unbiasedness (MLR.1 to MLR.4) and consistency never use the constant-variance assumption, so β^j\hat{\beta}_j is still unbiased and consistent. Only the standard errors, and the tt, FF, and confidence intervals built from them, are wrong.
  • Heteroskedasticity lowers the R2R^2 or distorts goodness of fit. The population R2R^2 and its sample estimate depend on the conditional mean and the variance of yy, not on whether Var(ux)\mathrm{Var}(u\mid x) is constant, so model fit is unaffected.
  • A big spread in the residuals always means heteroskedasticity. Large but constant scatter is homoskedastic. Heteroskedasticity is specifically a spread that changes with the regressors, which is why you test it with the squared residuals against the explanatory variables.
  • If errors are heteroskedastic, OLS is useless and you must abandon it. Modern practice keeps OLS and simply replaces the standard errors with robust ones, because the estimator itself is still unbiased and consistent.

Revision bullets

  • Definition: Var(ux)\mathrm{Var}(u\mid x) depends on the regressors, violating MLR.5
  • OLS stays unbiased and consistent, and R2R^2 is unaffected
  • Gauss-Markov fails, so OLS is no longer BLUE (no longer minimum variance)
  • The usual SE formula is wrong, so tt, FF, and CIs are invalid
  • Fix the inference, not the estimator: robust SEs, or WLS and FGLS

Quick check

Under heteroskedasticity, which property of OLS is lost?

A funnel-shaped plot of residuals against fitted values most directly suggests:

Why is heteroskedasticity described as breaking the Gauss-Markov theorem?

Connected topics

Sources

  1. Wooldridge (2019), Ch. 8
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
    Chapter 8 opens by showing that heteroskedasticity leaves OLS unbiased and consistent while invalidating the usual variance estimator and the tests built on it.
  2. Wooldridge (2019), §8.1
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 7th ed., Section 8.1, Consequences of Heteroskedasticity for OLS. Cengage, 2019.
    States the precise consequences: estimators stay unbiased and consistent, but standard errors and test statistics are no longer valid.
How to cite this page
Dr. Phil's Quant Lab. (2026). Heteroskedasticity. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-heteroskedasticity