Spurious Regression and Cointegration
Regressing one series on another, unrelated series produces a spurious regression: a high and large statistics that are entirely misleading because the variables share no real link. This is the defining hazard of unit-root data and the reason a high between trending series proves nothing. The important exception is cointegration, when a linear combination of two series is itself stationary (), so the variables move together in the long run. Cointegrated variables admit an error-correction representation in which short-run changes adjust to close the gap from their long-run equilibrium.
Why it matters
Two random walks can drift in the same direction by chance for a long stretch, and OLS mistakes that coincidence for a strong relationship. Cointegration is the genuine version. Picture two drunks leaving a bar tied by a short rope. Each wanders unpredictably (), yet the distance between them stays bounded (). That bounded gap is the equilibrium relationship, and error correction is the tug on the rope that pulls them back whenever they drift too far apart.
Formulas
Worked examples
Two unrelated series give a regression with a very high and a huge t-statistic, and you must judge whether the link is real.
Be skeptical. With both series this is the classic spurious-regression pattern. Test for cointegration by saving the residuals, `predict uhat, resid`, and applying a unit-root test to them with `dfuller uhat`. Only if the residuals are stationary is the relationship genuine cointegration rather than spurious.
Granger and Newbold (1974) ran the founding demonstration of this hazard. They simulated pairs of completely independent random walks and regressed one on the other, knowing the true relationship was zero. What did the regressions report?
Despite a zero true relationship, the regressions routinely produced a high and large, apparently significant statistics, purely because both series shared non-stationary trends that drifted together by chance. Their practical warning sign was simple, namely be suspicious whenever exceeds the Durbin-Watson statistic. The takeaway is that with non-stationary series standard OLS inference is invalid, so you should difference the data or test for genuine cointegration before trusting the regression.
Common mistakes
- ✗A high between two time series confirms a strong relationship. With variables this is often spurious and tells you nothing about a true link.
- ✗Any two trending variables are cointegrated. Cointegration requires a stationary linear combination, which is special, not automatic.
- ✗Differencing cointegrated variables is the right way to model them. Differencing discards the long-run equilibrium; an error-correction model keeps both the short-run and long-run information.
- ✗The error-correction term can have a positive coefficient. The adjustment coefficient must be negative so the system is pulled back toward equilibrium.
Revision bullets
- •Regressing independent series gives a spurious regression
- •Spurious results show high and large but no real link
- •Cointegration: a linear combination of series is
- •Cointegrated variables share a long-run equilibrium
- •Error correction restores equilibrium with a negative adjustment coefficient
Quick check
Regressing one I(1) series on an independent I(1) series typically produces:
Two I(1) variables are cointegrated when:
Connected topics
Sources
- Wooldridge (2019), §18.4Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Covers spurious regression with I(1) variables, cointegration, and error-correction models.
- Engle & Granger (1987)Engle, R.F., and C.W.J. Granger. Co-integration and Error Correction: Representation, Estimation, and Testing. Econometrica 55 (1987): 251-276.Foundational paper defining cointegration and the two-step error-correction estimation procedure.
- Granger & Newbold (1974)Granger, C.W.J., and P. Newbold. Spurious Regressions in Econometrics. Journal of Econometrics 2, no. 2 (1974): 111-120.The original Monte Carlo demonstration that regressing independent random walks yields high R-squared and significant t-statistics, with the R-squared greater than Durbin-Watson rule of thumb.