Skip to content

Fitted Values and Residuals

For each observation the fitted value is y^i=β^0+β^1xi\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i and the residual is u^i=yiy^i\hat{u}_i = y_i - \hat{y}_i, the vertical gap between the actual point and the line. OLS gives three algebraic properties that always hold by construction: the residuals sum to zero, u^i=0\sum \hat{u}_i = 0; they are uncorrelated with the regressor, xiu^i=0\sum x_i \hat{u}_i = 0; and the line passes through the sample means, so (xˉ,yˉ)(\bar{x}, \bar{y}) lies on it. Crucially, the residual u^i\hat{u}_i is not the error uiu_i, it is the estimated, observable counterpart of the unobservable population error.

Try it yourself

Least squares — minimising SSR

OLS picks the line that minimises the sum of squared residuals, SSR = Σ(yᵢ − ŷᵢ)². Residuals are the vertical gaps from each point to the line. Drag your blue line and try to beat the gold OLS line on SSR.

SSR — your line vs OLS26 vs 26
161116221357911xyOLS best-fit lineYour line
OLS line ŷ = 2.1 + 1.55xSSR (OLS, min) 26R² (OLS) 93%
Your intercept b₀2.1
Your slope b₁1.55
Your line sits exactly on the OLS line, so the two SSRs are equal at the minimum 26. Nudge a slider and the loss can only go up.

Why it matters

After fitting the line you can read off, for each person, what the line predicts (y^\hat{y}) and how far the truth sat above or below it (u^\hat{u}). Because of how OLS is built, those residuals balance out to zero and carry no leftover linear relationship with xx. The error uu, by contrast, is the true unobserved gap in the population that you never actually see. The residual is your best in-sample estimate of it, much as a sample mean estimates a population mean.

Formulas

Fitted value and residual
y^i=β^0+β^1xi,u^i=yiy^i\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i, \qquad \hat{u}_i = y_i - \hat{y}_i
The fitted value lies on the regression line; the residual is the signed vertical distance from the actual point to that line.
Algebraic properties of OLS residuals
i=1nu^i=0,i=1nxiu^i=0\sum_{i=1}^{n} \hat{u}_i = 0, \qquad \sum_{i=1}^{n} x_i \hat{u}_i = 0
These hold for every OLS regression with an intercept. They also imply the point (xˉ,yˉ)(\bar{x}, \bar{y}) is always on the fitted line.

Worked examples

Scenario

After `regress wage educ`, a student types `predict wagehat` and `predict uhat, residuals` in Stata and then `summarize uhat`.

Solution

`wagehat` holds the fitted values y^i\hat{y}_i and `uhat` holds the residuals u^i\hat{u}_i. The mean of `uhat` reported by `summarize` is zero (up to rounding), illustrating u^i=0\sum \hat{u}_i = 0. A scatter of `uhat` against `educ` shows no linear trend, reflecting xiu^i=0\sum x_i \hat{u}_i = 0. These residuals are estimates of the unobserved errors, not the errors themselves.

Note`predict` without the `residuals` option returns fitted values by default, a common point of confusion.

Common mistakes

  • The residual u^i\hat{u}_i equals the error term uiu_i. The error ui=yiβ0β1xiu_i = y_i - \beta_0 - \beta_1 x_i uses the true unknown parameters and is unobservable. The residual u^i=yiβ^0β^1xi\hat{u}_i = y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i uses the estimates and is observable. They coincide only if the estimates equal the true parameters, which essentially never happens.
  • Residuals summing to zero means the model fits well. u^i=0\sum \hat{u}_i = 0 holds for any OLS fit with an intercept, even a terrible one. It is a mechanical property, not evidence of good fit.
  • Zero correlation between residuals and xx confirms the zero conditional mean assumption. xiu^i=0\sum x_i \hat{u}_i = 0 is built into OLS by construction and tells you nothing about whether E(ux)=0E(u \mid x) = 0 holds in the population.
  • The regression line need not pass through the average point. With an intercept, OLS always makes the line go through (xˉ,yˉ)(\bar{x}, \bar{y}). This follows directly from β^0=yˉβ^1xˉ\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}.

Revision bullets

  • Fitted value y^i=β^0+β^1xi\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i lies on the line
  • Residual u^i=yiy^i\hat{u}_i = y_i - \hat{y}_i is the vertical miss
  • u^i=0\sum \hat{u}_i = 0 and xiu^i=0\sum x_i \hat{u}_i = 0 by construction
  • The fitted line always passes through (xˉ,yˉ)(\bar{x}, \bar{y})
  • Residual u^i\hat{u}_i estimates, but is not equal to, the error uiu_i

Quick check

What is the key difference between the residual u^i\hat{u}_i and the error uiu_i?

In an OLS regression with an intercept, which statement is always true by construction?

Connected topics

Sources

  1. Wooldridge (2019), Ch. 2.3
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.
    Section 2.3 defines fitted values and residuals and lists the algebraic properties of OLS, including the two summation conditions and passage through the sample means.
  2. Wooldridge (2019), §2.5 (errors vs residuals)
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.
    Clarifies the distinction between the unobservable error and the computed residual.
How to cite this page
Dr. Phil's Quant Lab. (2026). Fitted Values and Residuals. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-fitted-residuals