Multicollinearity
Multicollinearity is high correlation among the explanatory variables. It inflates the variance of the affected slope estimators through the variance inflation factor, , where is from regressing on the other regressors. Crucially, multicollinearity does not bias OLS, which stays unbiased and consistent. It is a precision problem of small samples, widening standard errors and confidence intervals, so estimates become imprecise rather than wrong.
Try it yourself
When two regressors are correlated, OLS still hits the true value on average, but each coefficient is harder to pin down. With two regressors the auxiliary fit gives R²ⱼ = ρ², so the variance inflation factor is VIF = 1 / (1 − ρ²) and the standard error widens by √VIF versus no collinearity, holding σ² and SSTⱼ fixed.
estat vif after regress.Why it matters
If two regressors move together almost in lockstep, the data struggle to tell their separate effects apart, like asking which of two people pushing a cart did the work when they always push together. The coefficients can swing around and the standard errors balloon. But nothing is being distorted on average. With more data, or with the two variables varying more independently, the haze clears. That is why this is about precision, not bias.
Formulas
Worked examples
A wage regression includes both total experience and tenure at the current firm, which are strongly correlated.
After `regress lwage educ exper tenure`, run `estat vif`. If `tenure` has a VIF of about 12, its standard error is inflated and its statistic may be small even though experience-and-tenure together explain wages well. The point estimates remain unbiased; only their precision suffers.
A demand model includes price and price-squared, which are mechanically correlated over the sample.
Centering the variable before squaring, with `gen pc = price - r(mean)` after `summarize price` then using `c.pc##c.pc`, reduces the correlation between the level and the square and lowers the VIFs. The fitted curve and predictions are unchanged, which confirms collinearity was a precision issue, not a specification error.
Common mistakes
- ✗Believing multicollinearity biases the coefficients. OLS stays unbiased and consistent; only the variances rise. It is a precision problem, not a bias.
- ✗Dropping a correlated regressor to "fix" it. Removing a relevant variable can introduce omitted variable bias, trading a precision problem for a far worse consistency problem.
- ✗Treating a VIF above 10 as a hard failure. The cutoff is a convention. What matters is whether your standard errors are small enough to answer the question.
- ✗Confusing perfect collinearity with high collinearity. Perfect collinearity (an exact linear relationship) breaks OLS entirely; high but imperfect collinearity merely inflates variances.
Revision bullets
- •High correlation among regressors inflates via the VIF.
- •It does not bias OLS; estimates stay unbiased and consistent.
- •It is a precision problem, worst in small samples.
- •Symptom: high joint but insignificant individual statistics.
- •Do not drop a relevant variable just to cut collinearity, that risks OVB.
Quick check
Multicollinearity primarily affects OLS by:
A regressor has a VIF of 15. The most defensible response is:
Connected topics
Sources
- Wooldridge (2019), §3.4Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Derives , the role of , the VIF, and stresses that collinearity affects variance, not bias.
- Wooldridge (2019), §3.4aWooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Discusses why dropping correlated but relevant variables can cause omitted variable bias.