Skip to content

Correlation Versus Causality

Correlation says two variables move together. Causality says changing one actually changes the other. A regression slope is causal only if the regressor is unrelated to the omitted factors in the error, the ceteris paribus condition. When an omitted variable drives both xx and yy, the slope is biased and confuses association with cause. This distinction is the central difficulty of applied econometrics and motivates everything from controls to natural experiments.

Why it matters

Ice cream sales and drowning deaths rise together, but neither causes the other; hot weather drives both. A raw slope only tells you they travel together. To call it causal you must be able to say that nudging xx while everything else stays fixed would move yy. The hard part is defending "everything else stays fixed" when much of it is unobserved.

Formulas

Condition for a causal slope
E(ux)=0E(u \mid x) = 0
The slope β1\beta_1 carries a ceteris paribus, causal reading only when the error has zero mean given xx, so xx is unrelated to the omitted factors in uu.

Worked examples

Scenario

A regression of wages on years of schooling gives a positive slope. Can you call it the causal return to education?

Solution

Not automatically. Ability and family background raise both schooling and wages and sit in the error term. If they correlate with schooling, then E(ueduc)0E(u \mid educ) \ne 0 and the slope overstates the causal effect. You would need controls, a proxy for ability, or a natural experiment before claiming causation.

Common mistakes

  • A statistically significant slope proves causation. Significance only says the association is unlikely to be zero by chance; it says nothing about whether the relationship is causal.
  • A high correlation implies one variable causes the other. Strong correlation can arise from a common cause or pure coincidence, with no causal link in either direction.
  • Adding more controls always turns a slope causal. Controls help only for the factors you can measure; an unobserved confounder still biases the estimate.
  • Causation always runs from the regressor to the dependent variable. Reverse causality, where yy affects xx, can produce a misleading slope just as omitted variables can.

Revision bullets

  • Correlation is co-movement; causality is a true effect of changing xx
  • A slope is causal only if E(ux)=0E(u \mid x) = 0
  • Omitted common causes bias the slope and mimic causation
  • Significance is not the same as causation
  • Reverse causality is a second threat alongside omitted variables

Quick check

A regression slope can be read as a causal, ceteris paribus effect only when

Ice cream sales and drownings are strongly positively correlated. The most likely reason is

Connected topics

Sources

  1. Wooldridge (2019), Ch. 1-2
    Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
    Section 1.4 and Chapter 2 contrast correlation with the ceteris paribus causal effect and the zero-conditional-mean condition.
How to cite this page
Dr. Phil's Quant Lab. (2026). Correlation Versus Causality. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-causality-correlation