Skip to content
Simple Regressionintermediate

Deriving the OLS Estimates

Ordinary least squares (OLS) chooses β^0\hat{\beta}_0 and β^1\hat{\beta}_1 to minimize the sum of squared residuals i=1nu^i2=(yiβ^0β^1xi)2\sum_{i=1}^{n} \hat{u}_i^2 = \sum (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2. Setting the two partial derivatives to zero gives the first-order conditions, also called the OLS normal equations, whose solution is the slope β^1=(xixˉ)(yiyˉ)(xixˉ)2\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} and the intercept β^0=yˉβ^1xˉ\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. The slope is the sample covariance of xx and yy divided by the sample variance of xx, so OLS requires xx to vary in the sample.

Try it yourself

Least squares — minimising SSR

OLS picks the line that minimises the sum of squared residuals, SSR = Σ(yᵢ − ŷᵢ)². Residuals are the vertical gaps from each point to the line. Drag your blue line and try to beat the gold OLS line on SSR.

SSR — your line vs OLS26 vs 26
161116221357911xyOLS best-fit lineYour line
OLS line ŷ = 2.1 + 1.55xSSR (OLS, min) 26R² (OLS) 93%
Your intercept b₀2.1
Your slope b₁1.55
Your line sits exactly on the OLS line, so the two SSRs are equal at the minimum 26. Nudge a slider and the loss can only go up.

Why it matters

OLS draws the line that makes the vertical misses as small as possible, after squaring them so that positive and negative gaps both count and large gaps are penalized heavily. The slope formula is the covariance of xx and yy scaled by how spread out xx is. If xx barely moves, the denominator collapses and the slope is unstable or undefined, which is why some variation in xx is essential. The intercept then simply pins the line so it passes through the average point (xˉ,yˉ)(\bar{x}, \bar{y}).

Formulas

OLS slope estimator
β^1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2\hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}
Sample covariance of xx and yy over the sample variance of xx. Defined only when (xixˉ)2>0\sum (x_i - \bar{x})^2 > 0, that is, when xx varies.
OLS intercept estimator
β^0=yˉβ^1xˉ\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}
Forces the fitted line through the sample means, so (xˉ,yˉ)(\bar{x}, \bar{y}) always lies on the regression line.
First-order conditions
u^i=0,xiu^i=0\sum \hat{u}_i = 0, \qquad \sum x_i \hat{u}_i = 0
These two normal equations are the sample analogues of E(u)=0E(u) = 0 and E(xu)=0E(xu) = 0, the method-of-moments conditions OLS solves.

Worked examples

Scenario

A researcher regresses annual CEO salary on return on equity with `regress salary roe` and wants to know where the slope number comes from.

Solution

Stata computes β^1\hat{\beta}_1 as the sample covariance between salary and roe divided by the sample variance of roe, then sets β^0=salaryβ^1roe\hat{\beta}_0 = \overline{\text{salary}} - \hat{\beta}_1\,\overline{\text{roe}}. The reported coefficients are exactly the values that minimize the sum of squared residuals across all firms in the sample. No other intercept-slope pair produces a smaller u^i2\sum \hat{u}_i^2.

NoteIf `roe` had no variation across firms, OLS could not separate the slope and Stata would drop the regressor.

Common mistakes

  • OLS minimizes the sum of the residuals. It minimizes the sum of the *squared* residuals. The plain sum of residuals is always zero for any line through (xˉ,yˉ)(\bar{x},\bar{y}) with the right intercept, so minimizing it would not identify a unique line.
  • OLS minimizes perpendicular distances to the line. OLS minimizes *vertical* distances, the gaps in the yy direction. Minimizing perpendicular distance is a different method (total least squares) and gives different estimates.
  • The slope formula works even when xx is constant. If xx takes the same value for everyone, (xixˉ)2=0\sum (x_i - \bar{x})^2 = 0 and the slope is undefined. Variation in xx is a requirement for OLS, not an optional nicety.
  • The first-order conditions are assumptions about the population. They are algebraic consequences of minimizing the squared residuals in the sample. They hold by construction for any OLS fit, regardless of whether the model assumptions are true.

Revision bullets

  • OLS minimizes u^i2\sum \hat{u}_i^2, the sum of squared residuals
  • Slope β^1=(xixˉ)(yiyˉ)(xixˉ)2\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = sample cov over sample var of xx
  • Intercept β^0=yˉβ^1xˉ\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}, line passes through (xˉ,yˉ)(\bar{x}, \bar{y})
  • First-order conditions: u^i=0\sum \hat{u}_i = 0 and xiu^i=0\sum x_i \hat{u}_i = 0
  • Requires sample variation in xx ((xixˉ)2>0\sum (x_i - \bar{x})^2 > 0)

Quick check

What objective function does ordinary least squares minimize?

The OLS slope β^1\hat{\beta}_1 can be written as which of the following?

Connected topics

Sources

  1. Wooldridge (2019), Ch. 2.2
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.
    Section 2.2 derives the OLS estimates from the minimization problem and the first-order conditions, and presents the slope and intercept formulas.
  2. Wooldridge (2019), Appendix A (algebra)
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.
    Reviews summation algebra and the sample covariance and variance used to derive the OLS slope.
How to cite this page
Dr. Phil's Quant Lab. (2026). Deriving the OLS Estimates. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-ols-derivation