Foundations & Databeginner

Probability and Statistics Review

OLS rests on a few ideas from probability. The expected value $E(X)$ is the population mean, the variance $\mathrm{Var}(X)$ measures spread, and the covariance $\mathrm{Cov}(X,Y)$ measures how two variables move together. We never see the population, so we estimate its parameters from a sample, and a different sample would give different estimates. This sampling variation is exactly what standard errors quantify, and the Central Limit Theorem explains why estimators become approximately normal in large samples.

Why it matters

There is a true mean out in the world (the population) and an estimate we compute from our data (the sample). The two are not the same, and the gap shrinks as the sample grows. The Central Limit Theorem is the quiet hero of the course. It says averages pile up into a bell curve even when the raw data do not, which is why t-tests and confidence intervals work at all.

Formulas

Variance and covariance

\mathrm{Var}(X) = E\!\left[(X - \mu_X)^2\right], \quad \mathrm{Cov}(X,Y) = E\!\left[(X-\mu_X)(Y-\mu_Y)\right]

Variance is the average squared distance from the mean

\mu_X

. Covariance is positive when

X

and

Y

tend to lie on the same side of their means.

Central Limit Theorem (sample mean)

\bar{X} \;\overset{a}{\sim}\; \mathrm{Normal}\!\left(\mu, \tfrac{\sigma^2}{n}\right)

For a large random sample, the sample mean

\bar{X}

is approximately normal regardless of the distribution of

X

. Its spread

\sigma^2/n

falls as

n

grows.

Worked examples

Scenario

You draw 50 workers and compute a mean wage. A classmate draws a different 50 and gets a different mean. Which value is "right"?

Solution

Neither is the true population mean; both are estimates that vary by sample. The standard error reports the typical size of that variation. By the Central Limit Theorem, across many such samples the means would cluster in an approximately normal bell around the true mean, which is what makes interval estimates meaningful.

Common mistakes

✗The sample mean equals the population mean. The sample mean estimates the population mean but almost always differs from it; the difference is sampling error.
✗Zero covariance means two variables are unrelated. Zero covariance rules out a linear association but not a nonlinear one, since variables can be dependent yet have zero covariance.
✗The Central Limit Theorem requires the data themselves to be normal. The theorem delivers approximate normality of the average even when the underlying variable is far from normal, provided the sample is large.
✗A larger sample makes each observation more precise. A larger sample sharpens estimates of population parameters, not the individual data points, by shrinking the variance of the estimator.

Revision bullets

•Expected value is the population mean; variance measures spread
•Covariance measures linear co-movement of two variables
•Sample statistics estimate population parameters with error
•Sampling variation is what standard errors quantify
•CLT: sample averages are approximately normal in large samples

Quick check

Two different random samples from the same population usually give different means because of

The Central Limit Theorem implies that, in large samples, the distribution of the sample mean is approximately

Connected topics

Econometrics Population Model Cause vs Corr Research Process

Sources

Wooldridge (2019), App. B-C
Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
Appendices B and C review probability, expectation, and the Central Limit Theorem assumed throughout Chapters 2 to 5.

How to cite this page

Dr. Phil's Quant Lab. (2026). Probability and Statistics Review. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-probability-review

← Back to the atlas See in the network →