Omitted Variable Bias
Omitted variable bias (OVB) arises when you leave out a variable that belongs in the model and that variable correlates with an included regressor. The included slope then absorbs part of the omitted effect, so OLS is biased and stays biased as the sample grows, which means it is inconsistent for the true coefficient. The bias direction follows between the omitted and included variables.
Try it yourself
True model y = β₀ + β₁x₁ + β₂x₂ + u. Drop the relevant x₂ and the short regression hands x₂'s credit to x₁. The bias is β₂·δ, where δ = ρ·(σ₂/σ₁) is the auxiliary slope of x₂ on x₁.
Assumption (held fixed): β₁ = 1.00, σ₁ = σ₂ = 1, so δ = ρ. This is a population (plim) result, no sampling noise.
Why it matters
If ability raises wages and also rises with schooling, a wage regression that omits ability hands ability’s credit to educ. The schooling coefficient then overstates the true return. Because the problem is built into the population relationship, more data does not fix it.
Formulas
Worked examples
You regress lwage on educ but omit ability, which you cannot observe.
Since ability plausibly raises wages () and correlates positively with schooling (), the bias is positive, so `regress lwage educ` overstates the return to schooling. Adding a proxy such as IQ via `regress lwage educ IQ` shrinks the educ coefficient.
Common mistakes
- ✗Thinking a bigger sample cures OVB. The bias survives in the limit, so OLS is inconsistent, not merely imprecise.
- ✗Believing only variables correlated with the outcome cause bias. The omitted variable must correlate with the included regressor, not just with .
- ✗Assuming the bias is always upward. Its sign depends on the product of the omitted effect and the correlation, so it can pull the estimate either way.
- ✗Treating OVB as a small-sample or measurement quirk. It is a population specification problem that violates the zero conditional mean assumption.
Revision bullets
- •OVB needs a relevant omitted variable that correlates with an included regressor.
- •It biases the coefficient and persists in large samples, so OLS is inconsistent.
- •Sign of bias = sign() times sign(corr between omitted and included).
- •A proxy or control for the omitted factor reduces the bias.
- •OVB is the failure mode behind the zero conditional mean assumption MLR.4.
Quick check
Omitting a relevant variable biases OLS only when that variable is correlated with what?
Ability raises wages and is positively correlated with schooling. Omitting ability makes the educ coefficient:
Why is omitted variable bias described as a threat to consistency?
Connected topics
Sources
- Wooldridge, Introductory Econometrics, Ch. 3Wooldridge (2019), Introductory Econometrics: A Modern Approach, 7th ed., Sec. 3.3
- Angrist & Pischke, Mostly Harmless EconometricsAngrist & Pischke (2009), Mostly Harmless Econometrics, Ch. 3 (omitted variables bias formula)