Correlation Versus Causality
Correlation says two variables move together. Causality says changing one actually changes the other. A regression slope is causal only if the regressor is unrelated to the omitted factors in the error, the ceteris paribus condition. When an omitted variable drives both and , the slope is biased and confuses association with cause. This distinction is the central difficulty of applied econometrics and motivates everything from controls to natural experiments.
Why it matters
Ice cream sales and drowning deaths rise together, but neither causes the other; hot weather drives both. A raw slope only tells you they travel together. To call it causal you must be able to say that nudging while everything else stays fixed would move . The hard part is defending "everything else stays fixed" when much of it is unobserved.
Formulas
Worked examples
A regression of wages on years of schooling gives a positive slope. Can you call it the causal return to education?
Not automatically. Ability and family background raise both schooling and wages and sit in the error term. If they correlate with schooling, then and the slope overstates the causal effect. You would need controls, a proxy for ability, or a natural experiment before claiming causation.
Common mistakes
- ✗A statistically significant slope proves causation. Significance only says the association is unlikely to be zero by chance; it says nothing about whether the relationship is causal.
- ✗A high correlation implies one variable causes the other. Strong correlation can arise from a common cause or pure coincidence, with no causal link in either direction.
- ✗Adding more controls always turns a slope causal. Controls help only for the factors you can measure; an unobserved confounder still biases the estimate.
- ✗Causation always runs from the regressor to the dependent variable. Reverse causality, where affects , can produce a misleading slope just as omitted variables can.
Revision bullets
- •Correlation is co-movement; causality is a true effect of changing
- •A slope is causal only if
- •Omitted common causes bias the slope and mimic causation
- •Significance is not the same as causation
- •Reverse causality is a second threat alongside omitted variables
Quick check
A regression slope can be read as a causal, ceteris paribus effect only when
Ice cream sales and drownings are strongly positively correlated. The most likely reason is
Connected topics
Sources
- Wooldridge (2019), Ch. 1-2Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.Section 1.4 and Chapter 2 contrast correlation with the ceteris paribus causal effect and the zero-conditional-mean condition.