Deriving the OLS Estimates
Ordinary least squares (OLS) chooses and to minimize the sum of squared residuals . Setting the two partial derivatives to zero gives the first-order conditions, also called the OLS normal equations, whose solution is the slope and the intercept . The slope is the sample covariance of and divided by the sample variance of , so OLS requires to vary in the sample.
Try it yourself
OLS picks the line that minimises the sum of squared residuals, SSR = Σ(yᵢ − ŷᵢ)². Residuals are the vertical gaps from each point to the line. Drag your blue line and try to beat the gold OLS line on SSR.
Why it matters
OLS draws the line that makes the vertical misses as small as possible, after squaring them so that positive and negative gaps both count and large gaps are penalized heavily. The slope formula is the covariance of and scaled by how spread out is. If barely moves, the denominator collapses and the slope is unstable or undefined, which is why some variation in is essential. The intercept then simply pins the line so it passes through the average point .
Formulas
Worked examples
A researcher regresses annual CEO salary on return on equity with `regress salary roe` and wants to know where the slope number comes from.
Stata computes as the sample covariance between salary and roe divided by the sample variance of roe, then sets . The reported coefficients are exactly the values that minimize the sum of squared residuals across all firms in the sample. No other intercept-slope pair produces a smaller .
Common mistakes
- ✗OLS minimizes the sum of the residuals. It minimizes the sum of the *squared* residuals. The plain sum of residuals is always zero for any line through with the right intercept, so minimizing it would not identify a unique line.
- ✗OLS minimizes perpendicular distances to the line. OLS minimizes *vertical* distances, the gaps in the direction. Minimizing perpendicular distance is a different method (total least squares) and gives different estimates.
- ✗The slope formula works even when is constant. If takes the same value for everyone, and the slope is undefined. Variation in is a requirement for OLS, not an optional nicety.
- ✗The first-order conditions are assumptions about the population. They are algebraic consequences of minimizing the squared residuals in the sample. They hold by construction for any OLS fit, regardless of whether the model assumptions are true.
Revision bullets
- •OLS minimizes , the sum of squared residuals
- •Slope = sample cov over sample var of
- •Intercept , line passes through
- •First-order conditions: and
- •Requires sample variation in ()
Quick check
What objective function does ordinary least squares minimize?
The OLS slope can be written as which of the following?
Connected topics
Sources
- Wooldridge (2019), Ch. 2.2Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.Section 2.2 derives the OLS estimates from the minimization problem and the first-order conditions, and presents the slope and intercept formulas.
- Wooldridge (2019), Appendix A (algebra)Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.Reviews summation algebra and the sample covariance and variance used to derive the OLS slope.