Heteroskedasticityintermediate

Heteroskedasticity

Heteroskedasticity means the error variance changes with the regressors, $\mathrm{Var}(u\mid x_1,\dots,x_k)=\sigma^2(x)$ rather than a constant $\sigma^2$ . It violates the homoskedasticity assumption (MLR.5), the last piece of Gauss-Markov. The consequences are sharply bounded: OLS coefficients stay unbiased and consistent, and $R^2$ is unaffected, because none of those properties uses MLR.5. What breaks is inference: the usual standard error formula is wrong, so the reported $t$ statistics, $F$ statistics, and confidence intervals are invalid. This node is the hub of the violation, test, remedy chain for the variance assumption.

Try it yourself

Classical vs robust standard errors

The error spread grows with x (heteroskedasticity), yet the OLS slope stays unbiased. What breaks is the standard error: the classical SE is invalid here, while the robust (HC1) SE — Stata’s , robust — is asymptotically valid. The point estimate is fixed by construction, so only the SEs move.

SE(b₁)classical 0.158vsrobust 0.2141.36×

95% confidence interval for the slope b₁ (same centre, different width)

Slope b₁ (fixed by construction) 0.80Robust ÷ classical SE 1.36×

Heteroskedasticity6.0

The slope estimate is 0.80 at every setting, so heteroskedasticity has not biased it. But the classical SE (0.158) is invalid here, while the robust SE (0.214) is asymptotically valid. In this design robust is 1.36× the classical, so the classical confidence interval is too narrow and its t-test overstates significance.

Robust here is HC1 (finite-sample adjusted), the same estimator as Stata’s regress y x, robust. It is asymptotically valid, not exactly correct, and need not be larger in general — it simply happens to be larger in this rising-variance design.

Why it matters

Picture a scatter of household savings against income. At low incomes the points hug the regression line; at high incomes they fan out, because richer households have far more discretion over how much to save. The line through the cloud is still in the right place on average, so the slope estimate is fine. The problem is that OLS, told the spread is the same everywhere, miscounts how precise the slope is. It leans too hard on the noisy high-income points and reports a standard error that no longer matches reality, so every $t$ and $F$ test built on it is untrustworthy.

Formulas

Homoskedasticity (MLR.5), the assumption that fails

\mathrm{Var}(u \mid x_1, \dots, x_k) = \sigma^2

Heteroskedasticity replaces the constant

\sigma^2

with

\sigma^2(x)

, a variance that depends on the explanatory variables.

True slope variance (simple regression) vs the formula OLS reports

\mathrm{Var}(\hat{\beta}_1) = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2 \, \sigma_i^2}{\left[\sum_{i=1}^{n} (x_i - \bar{x})^2\right]^2}

Under homoskedasticity every

\sigma_i^2=\sigma^2

and this collapses to

\sigma^2 / \mathrm{SST}_x

, the usual textbook formula. When the

\sigma_i^2

differ, that simple formula is biased, so the reported SE is wrong.

Worked examples

Scenario

You regress household saving on income, family size, and age in Stata and want a first read on whether the error variance is constant before trusting the t statistics.

Solution

Run `regress saving inc size age`, then plot the squared residuals against fitted values with `rvfplot, yline(0)`. A funnel that widens with fitted saving is the visual signature of heteroskedasticity. Because OLS is still unbiased, the point estimates are usable, but you should not report the default standard errors until you have either tested the variance or switched to robust SEs.

Note`rvfplot` is a diagnostic, not a decision rule. Confirm with a formal test (Breusch-Pagan or White) before drawing conclusions.

Common mistakes

✗Heteroskedasticity biases the OLS coefficients. It does not. Unbiasedness (MLR.1 to MLR.4) and consistency never use the constant-variance assumption, so $\hat{\beta}_j$ is still unbiased and consistent. Only the standard errors, and the $t$ , $F$ , and confidence intervals built from them, are wrong.
✗Heteroskedasticity lowers the $R^2$ or distorts goodness of fit. The population $R^2$ and its sample estimate depend on the conditional mean and the variance of $y$ , not on whether $\mathrm{Var}(u\mid x)$ is constant, so model fit is unaffected.
✗A big spread in the residuals always means heteroskedasticity. Large but constant scatter is homoskedastic. Heteroskedasticity is specifically a spread that changes with the regressors, which is why you test it with the squared residuals against the explanatory variables.
✗If errors are heteroskedastic, OLS is useless and you must abandon it. Modern practice keeps OLS and simply replaces the standard errors with robust ones, because the estimator itself is still unbiased and consistent.

Revision bullets

•Definition: $\mathrm{Var}(u\mid x)$ depends on the regressors, violating MLR.5
•OLS stays unbiased and consistent, and $R^2$ is unaffected
•Gauss-Markov fails, so OLS is no longer BLUE (no longer minimum variance)
•The usual SE formula is wrong, so $t$ , $F$ , and CIs are invalid
•Fix the inference, not the estimator: robust SEs, or WLS and FGLS

Quick check

Under heteroskedasticity, which property of OLS is lost?

A funnel-shaped plot of residuals against fitted values most directly suggests:

Why is heteroskedasticity described as breaking the Gauss-Markov theorem?

Connected topics

OLS Variance MLR assumptions Gauss-Markov Robust SE Hetero Tests WLS / FGLS

Sources

Wooldridge (2019), Ch. 8
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
Chapter 8 opens by showing that heteroskedasticity leaves OLS unbiased and consistent while invalidating the usual variance estimator and the tests built on it.
Wooldridge (2019), §8.1
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 7th ed., Section 8.1, Consequences of Heteroskedasticity for OLS. Cengage, 2019.
States the precise consequences: estimators stay unbiased and consistent, but standard errors and test statistics are no longer valid.

How to cite this page

Dr. Phil's Quant Lab. (2026). Heteroskedasticity. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-heteroskedasticity

← Back to the atlas See in the network →