Prediction and Prediction Intervals
Regression supports two distinct forecasts. One is the mean outcome at given regressor values, the other an individual outcome for a new unit. The interval for a single new is wider than the confidence interval for the mean, because it adds the irreducible error variance on top of the uncertainty in the estimated coefficients. Predictions far outside the sample range of the regressors are unreliable.
Why it matters
Estimating where the regression line sits at a point is easier than guessing one person’s actual value, since individuals scatter around the line. So a prediction interval for a new observation has to be padded for that personal randomness. And the model has no warrant once you extrapolate beyond the data it learned from.
Formulas
Worked examples
Using `regress lwage educ exper`, you predict log wage for a worker with 16 years of school and 5 of experience.
In Stata, `margins, at(educ=16 exper=5)` gives the mean prediction with its confidence interval, while a prediction interval for one such worker is wider because it also carries . The point estimate is the same; only the band differs.
Common mistakes
- ✗Using the confidence interval for the mean as the interval for a single new outcome. The individual interval is wider by the error variance .
- ✗Believing prediction intervals shrink toward zero width as . They converge to a positive width set by the irreducible error, not to zero.
- ✗Trusting predictions far outside the sample range. Extrapolation assumes the fitted relationship holds where there are no data, which is rarely safe.
- ✗Forgetting the retransformation issue when the model is in logs. Exponentiating the log prediction underestimates the mean level without a correction.
Revision bullets
- •Distinguish predicting the mean from predicting a single new .
- •The prediction interval for a new outcome is wider, adding .
- •Individual prediction intervals do not shrink to zero as grows.
- •Extrapolation beyond the sample range of the regressors is unreliable.
- •Log-model predictions need a retransformation adjustment for the level mean.
Quick check
Why is the prediction interval for a single new outcome wider than the confidence interval for the mean at the same ?
As the sample size grows very large, the width of a prediction interval for a single new outcome:
Connected topics
Sources
- Wooldridge, Introductory Econometrics, Ch. 6Wooldridge (2019), Introductory Econometrics: A Modern Approach, 7th ed., Sec. 6.4 (prediction and prediction intervals)
- Hill, Griffiths & Lim, Ch. 4Hill, Griffiths & Lim (2018), Principles of Econometrics, 5th ed., Ch. 4