Skip to content

Probability and Statistics Review

OLS rests on a few ideas from probability. The expected value E(X)E(X) is the population mean, the variance Var(X)\mathrm{Var}(X) measures spread, and the covariance Cov(X,Y)\mathrm{Cov}(X,Y) measures how two variables move together. We never see the population, so we estimate its parameters from a sample, and a different sample would give different estimates. This sampling variation is exactly what standard errors quantify, and the Central Limit Theorem explains why estimators become approximately normal in large samples.

Why it matters

There is a true mean out in the world (the population) and an estimate we compute from our data (the sample). The two are not the same, and the gap shrinks as the sample grows. The Central Limit Theorem is the quiet hero of the course. It says averages pile up into a bell curve even when the raw data do not, which is why t-tests and confidence intervals work at all.

Formulas

Variance and covariance
Var(X)=E ⁣[(XμX)2],Cov(X,Y)=E ⁣[(XμX)(YμY)]\mathrm{Var}(X) = E\!\left[(X - \mu_X)^2\right], \quad \mathrm{Cov}(X,Y) = E\!\left[(X-\mu_X)(Y-\mu_Y)\right]
Variance is the average squared distance from the mean μX\mu_X. Covariance is positive when XX and YY tend to lie on the same side of their means.
Central Limit Theorem (sample mean)
Xˉ  a  Normal ⁣(μ,σ2n)\bar{X} \;\overset{a}{\sim}\; \mathrm{Normal}\!\left(\mu, \tfrac{\sigma^2}{n}\right)
For a large random sample, the sample mean Xˉ\bar{X} is approximately normal regardless of the distribution of XX. Its spread σ2/n\sigma^2/n falls as nn grows.

Worked examples

Scenario

You draw 50 workers and compute a mean wage. A classmate draws a different 50 and gets a different mean. Which value is "right"?

Solution

Neither is the true population mean; both are estimates that vary by sample. The standard error reports the typical size of that variation. By the Central Limit Theorem, across many such samples the means would cluster in an approximately normal bell around the true mean, which is what makes interval estimates meaningful.

Common mistakes

  • The sample mean equals the population mean. The sample mean estimates the population mean but almost always differs from it; the difference is sampling error.
  • Zero covariance means two variables are unrelated. Zero covariance rules out a linear association but not a nonlinear one, since variables can be dependent yet have zero covariance.
  • The Central Limit Theorem requires the data themselves to be normal. The theorem delivers approximate normality of the average even when the underlying variable is far from normal, provided the sample is large.
  • A larger sample makes each observation more precise. A larger sample sharpens estimates of population parameters, not the individual data points, by shrinking the variance of the estimator.

Revision bullets

  • Expected value is the population mean; variance measures spread
  • Covariance measures linear co-movement of two variables
  • Sample statistics estimate population parameters with error
  • Sampling variation is what standard errors quantify
  • CLT: sample averages are approximately normal in large samples

Quick check

Two different random samples from the same population usually give different means because of

The Central Limit Theorem implies that, in large samples, the distribution of the sample mean is approximately

Connected topics

Sources

  1. Wooldridge (2019), App. B-C
    Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
    Appendices B and C review probability, expectation, and the Central Limit Theorem assumed throughout Chapters 2 to 5.
How to cite this page
Dr. Phil's Quant Lab. (2026). Probability and Statistics Review. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-probability-review