Unbiasedness of OLS
Under the four assumptions SLR.1 (linear in parameters), SLR.2 (random sampling), SLR.3 (sample variation in ), and SLR.4 (zero conditional mean, ), the OLS estimators are unbiased, so and . Unbiasedness means that on average across many random samples the estimator hits the true value, not that any single estimate is correct. Of the four, SLR.4 is the assumption most likely to fail in practice, and its failure is exactly what makes OLS biased.
Try it yourself
Unbiased means the estimator centres on the truth across MANY samples — not that one sample is right.
Fix a known process y = β₀ + β₁·x + u with u ~ Normal(0, σ). We draw M fresh samples of size n, and for each one compute the OLS slope β̂₁. The blue histogram is the sampling distribution of those estimates; the gold line marks the true β₁. The mean of the estimates lands on it even though no single sample does.
Setup: x is a fixed evenly-spaced grid on [1, 11] (so Sxx = Σ(xᵢ−x̄)² is known), errors are i.i.d. Normal(0, σ), and the estimator is β̂₁ = Σ(xᵢ−x̄)(yᵢ−ȳ)/Σ(xᵢ−x̄)². Larger n widens Sxx and shrinks the SE; larger σ widens it.
Set n small and watch the histogram spread out while its centre stays on the true β₁. If you only ever collected ONE sample, your single estimate could land far out in that spread. So how can an estimator be "unbiased" and yet be wrong in the one sample you actually have? What does unbiasedness promise you, and what does it not?
Why it matters
Picture drawing thousands of independent samples and running the same regression on each, producing thousands of slope estimates. Unbiasedness says the average of all those estimates equals the true . Your one estimate from your one dataset will almost surely differ from , that gap is sampling error, not bias. Bias is a property of the procedure across repeated samples, not a feature you can diagnose from a single regression output.
Formulas
Worked examples
An instructor simulates the experiment by drawing 1,000 samples from a known data-generating process and running `regress y x` on each, storing the slope each time.
Plotting a histogram of the 1,000 stored slopes shows them centred on the true used to generate the data, even though no single sample returns exactly . The spread around the centre is sampling variability. This Monte Carlo display makes concrete that unbiasedness is about the centre of the sampling distribution, not the accuracy of one draw.
Common mistakes
- ✗Unbiased means the estimate from my sample equals the true value. It means the estimator is correct *on average* over repeated samples. Any single estimate generally differs from because of sampling error.
- ✗Unbiasedness requires a large sample. Unbiasedness holds for any sample size once SLR.1 through SLR.4 are satisfied. It is a finite-sample property, unlike consistency, which is asymptotic.
- ✗If the estimate is far from what theory predicts, OLS is biased. A surprising estimate may simply reflect sampling variability or a wrong prior. Bias is a property of the estimation procedure under the assumptions, not something read off one number.
- ✗Unbiasedness also guarantees a small variance. Unbiasedness concerns only the centre of the sampling distribution. An estimator can be unbiased yet highly variable, which is why the variance of OLS is studied separately.
Revision bullets
- •Assumptions SLR.1 to SLR.4 give and
- •Unbiased = correct on average across repeated samples, not in one sample
- •It is a finite-sample property, holding for any
- •SLR.4 () is the assumption most likely to fail
- •Unbiasedness says nothing about the variance of the estimator
Quick check
Saying the OLS slope estimator is unbiased means that:
Which assumption is required for OLS to be unbiased and is also the one most likely to fail in applied work?
Connected topics
Sources
- Wooldridge (2019), Ch. 2.5Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.Section 2.5 states assumptions SLR.1 to SLR.4 and proves that OLS is unbiased, emphasizing that unbiasedness is a repeated-sampling property.
- Wooldridge (2019), §2.5 (Theorem 2.1)Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.Theorem 2.1 formally establishes and under the four assumptions.