Sampling Distribution and the Classical Linear Model — interactive tool

Sampling distribution of the OLS slope

Unbiased means the estimator centres on the truth across MANY samples — not that one sample is right.

Fix a known process y = β₀ + β₁·x + u with u ~ Normal(0, σ). We draw M fresh samples of size n, and for each one compute the OLS slope β̂₁. The blue histogram is the sampling distribution of those estimates; the gold line marks the true β₁. The mean of the estimates lands on it even though no single sample does.

mean of β̂₁ vs true β₁1.992 vs 2.00

mean β̂₁ 1.992true β₁ 2.00SD β̂₁ (empirical SE) 0.184theoretical SE 0.184

True slope β₁2.00

Error SD σ3.00

Sample size n30

Repeated samples M400

Across 400 samples the estimates average 1.992, a gap of just -0.008 from the true 2.00 (0.04 sampling SEs). That is unbiasedness: E[β̂₁] = β₁ across samples, even though individual estimates scatter around it. The spread is the sampling SE: empirical 0.184 tracks the theoretical σ/√Σ(xᵢ−x̄)² = 0.184, and the histogram approaches the normal curve (the CLT).

Setup: x is a fixed evenly-spaced grid on [1, 11] (so Sxx = Σ(xᵢ−x̄)² is known), errors are i.i.d. Normal(0, σ), and the estimator is β̂₁ = Σ(xᵢ−x̄)(yᵢ−ȳ)/Σ(xᵢ−x̄)². Larger n widens Sxx and shrinks the SE; larger σ widens it.

Discuss

Set n small and watch the histogram spread out while its centre stays on the true β₁. If you only ever collected ONE sample, your single estimate could land far out in that spread. So how can an estimator be "unbiased" and yet be wrong in the one sample you actually have? What does unbiasedness promise you, and what does it not?

Sampling Distribution and the Classical Linear ModelOpen in Dr Phil's Quant Lab ↗