The Zero Conditional Mean Assumption
The zero conditional mean assumption states , that the average value of the unobserved factors is the same (zero) at every value of . This is the key identifying assumption that lets OLS recover the causal effect rather than a mere correlation, and it implies the population relationship . It fails whenever an omitted factor in is correlated with , for example unobserved ability in a wage-on-education regression. When it fails, OLS is biased and inconsistent, and the slope no longer carries a ceteris paribus interpretation.
Why it matters
Imagine sorting people into groups by their value of , say years of schooling. The assumption says that within every schooling group, the average of all the other stuff in is the same, namely zero. If people with more schooling also tend to have higher ability, then is systematically larger in the high- group, the assumption breaks, and OLS credits ability’s effect to schooling. This is the single hinge on which causal interpretation turns, which is why it connects the population model to unbiasedness and to omitted variable bias.
Formulas
Worked examples
A researcher runs `regress wage educ` and wants to interpret the coefficient on `educ` as the causal return to schooling.
That causal reading is valid only if , meaning unobserved factors such as innate ability, motivation, and family background do not vary systematically with education. In practice more able people tend to acquire more schooling, so and `educ` are correlated, the assumption fails, and the OLS coefficient overstates the true return. The estimate then mixes the schooling effect with the ability effect.
Common mistakes
- ✗ is the same as . The unconditional mean is a harmless normalization absorbed by the intercept. The conditional version is far stronger and is the assumption that does the identifying work.
- ✗Zero correlation between and is enough for causal interpretation. is weaker than . The conditional mean assumption rules out all forms of mean dependence, not just linear correlation, and is what Wooldridge invokes for unbiasedness.
- ✗You can test the assumption directly using the residuals. OLS forces by construction, so the sample residuals are mechanically uncorrelated with . This tells you nothing about whether the population assumption holds.
- ✗If the assumption fails, the slope is meaningless. It is still a well-defined population quantity, the best linear predictor slope, but it no longer equals the causal effect. The failure changes the interpretation, not the existence, of .
Revision bullets
- •Assumption is , the key identifying condition
- •Stronger than zero correlation
- •Implies , so is causal
- •Fails when an omitted factor in is correlated with (e.g. ability)
- •Cannot be checked with OLS residuals; always holds
Quick check
The zero conditional mean assumption requires that:
In `regress wage educ`, the assumption would most plausibly fail because:
Connected topics
Sources
- Wooldridge (2019), Ch. 2.5Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.Section 2.5 introduces SLR.4, the zero conditional mean assumption, contrasts it with zero correlation, and explains its role in identifying the causal slope.
- Wooldridge (2019), §3.3 (omitted variables)Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.Links the failure of the zero conditional mean assumption to omitted variable bias and the direction of that bias.