Simple Regressionintermediate

Deriving the OLS Estimates

Ordinary least squares (OLS) chooses $\hat{\beta}_0$ and $\hat{\beta}_1$ to minimize the sum of squared residuals $\sum_{i=1}^{n} \hat{u}_i^2 = \sum (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2$ . Setting the two partial derivatives to zero gives the first-order conditions, also called the OLS normal equations, whose solution is the slope $\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$ and the intercept $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$ . The slope is the sample covariance of $x$ and $y$ divided by the sample variance of $x$ , so OLS requires $x$ to vary in the sample.

Try it yourself

Least squares — minimising SSR

OLS picks the line that minimises the sum of squared residuals, SSR = Σ(yᵢ − ŷᵢ)². Residuals are the vertical gaps from each point to the line. Drag your blue line and try to beat the gold OLS line on SSR.

SSR — your line vs OLS26 vs 26

OLS line ŷ = 2.1 + 1.55xSSR (OLS, min) 26R² (OLS) 93%

Your intercept b₀2.1

Your slope b₁1.55

Your line sits exactly on the OLS line, so the two SSRs are equal at the minimum 26. Nudge a slider and the loss can only go up.

Why it matters

OLS draws the line that makes the vertical misses as small as possible, after squaring them so that positive and negative gaps both count and large gaps are penalized heavily. The slope formula is the covariance of $x$ and $y$ scaled by how spread out $x$ is. If $x$ barely moves, the denominator collapses and the slope is unstable or undefined, which is why some variation in $x$ is essential. The intercept then simply pins the line so it passes through the average point $(\bar{x}, \bar{y})$ .

Formulas

OLS slope estimator

\hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}

Sample covariance of

x

and

y

over the sample variance of

x

. Defined only when

\sum (x_i - \bar{x})^2 > 0

, that is, when

x

varies.

OLS intercept estimator

\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}

Forces the fitted line through the sample means, so

(\bar{x}, \bar{y})

always lies on the regression line.

First-order conditions

\sum \hat{u}_i = 0, \qquad \sum x_i \hat{u}_i = 0

These two normal equations are the sample analogues of

E(u) = 0

and

E(xu) = 0

, the method-of-moments conditions OLS solves.

Worked examples

Scenario

A researcher regresses annual CEO salary on return on equity with `regress salary roe` and wants to know where the slope number comes from.

Solution

Stata computes $\hat{\beta}_1$ as the sample covariance between salary and roe divided by the sample variance of roe, then sets $\hat{\beta}_0 = \overline{\text{salary}} - \hat{\beta}_1\,\overline{\text{roe}}$ . The reported coefficients are exactly the values that minimize the sum of squared residuals across all firms in the sample. No other intercept-slope pair produces a smaller $\sum \hat{u}_i^2$ .

NoteIf `roe` had no variation across firms, OLS could not separate the slope and Stata would drop the regressor.

Common mistakes

✗OLS minimizes the sum of the residuals. It minimizes the sum of the *squared* residuals. The plain sum of residuals is always zero for any line through $(\bar{x},\bar{y})$ with the right intercept, so minimizing it would not identify a unique line.
✗OLS minimizes perpendicular distances to the line. OLS minimizes *vertical* distances, the gaps in the $y$ direction. Minimizing perpendicular distance is a different method (total least squares) and gives different estimates.
✗The slope formula works even when $x$ is constant. If $x$ takes the same value for everyone, $\sum (x_i - \bar{x})^2 = 0$ and the slope is undefined. Variation in $x$ is a requirement for OLS, not an optional nicety.
✗The first-order conditions are assumptions about the population. They are algebraic consequences of minimizing the squared residuals in the sample. They hold by construction for any OLS fit, regardless of whether the model assumptions are true.

Revision bullets

•OLS minimizes $\sum \hat{u}_i^2$ , the sum of squared residuals
•Slope $\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}$ = sample cov over sample var of $x$
•Intercept $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$ , line passes through $(\bar{x}, \bar{y})$
•First-order conditions: $\sum \hat{u}_i = 0$ and $\sum x_i \hat{u}_i = 0$
•Requires sample variation in $x$ ( $\sum (x_i - \bar{x})^2 > 0$ )

Quick check

What objective function does ordinary least squares minimize?

The OLS slope $\hat{\beta}_1$ can be written as which of the following?

Connected topics

SLR Model Fitted / Resid R-squared Unbiasedness Partialling out

Sources

Wooldridge (2019), Ch. 2.2
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.
Section 2.2 derives the OLS estimates from the minimization problem and the first-order conditions, and presents the slope and intercept formulas.
Wooldridge (2019), Appendix A (algebra)
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.
Reviews summation algebra and the sample covariance and variance used to derive the OLS slope.

How to cite this page

Dr. Phil's Quant Lab. (2026). Deriving the OLS Estimates. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-ols-derivation

← Back to the atlas See in the network →