The Simple Linear Regression Model
The simple linear regression (SLR) model writes the dependent variable as , where is the slope and the intercept. The slope gives the change in for a one-unit change in , holding the unobserved factors fixed, so when . The error term collects every other influence on not captured by . The model is linear in the parameters, which permits curved relationships in the variables themselves through transformations such as logs.
Try it yourself
OLS picks the line that minimises the sum of squared residuals, SSR = Σ(yᵢ − ŷᵢ)². Residuals are the vertical gaps from each point to the line. Drag your blue line and try to beat the gold OLS line on SSR.
Why it matters
Think of as the outcome you care about, wage, and as the one factor you put on the right-hand side, years of schooling. The line is the systematic part you can explain with , and is everything else, ability, family background, luck, rolled into a single bucket. The slope answers the practical question of how much moves when moves by one unit. The intercept is just where the line crosses at , which is only meaningful when is itself sensible.
Formulas
Worked examples
An applied labour economist models hourly wage on years of education using a sample from the U.S. labour force and runs `regress wage educ` in Stata.
Stata estimates . If , each additional year of schooling is associated with a wage that is higher by about 0.54 dollars per hour. The intercept is the predicted wage at zero years of education, which here is an out-of-sample extrapolation and should not be over-interpreted.
Common mistakes
- ✗Linear regression requires a straight-line relationship between the raw variables. The model only needs to be linear in the parameters and . Using or adding still fits inside the linear-in-parameters framework while producing a curved fit in the original units.
- ✗The intercept is always economically meaningful. is the value of when . If never occurs in the data (for example zero years of education), the intercept is a mathematical anchor for the line, not a quantity to interpret on its own.
- ✗The error term means the model is wrong or poorly specified. Every regression has an error term by construction. represents the many factors other than that influence . Its presence is normal and expected, not a sign of failure.
- ✗ must cause for the slope to be defined. The slope is a feature of the joint distribution of and and is always defined. Whether it carries a causal meaning is a separate question that depends on assumptions about .
Revision bullets
- •Model is , linear in the parameters
- • = change in per one-unit change in , holding fixed
- • = value of when (interpret only if is sensible)
- • collects all unobserved factors affecting
- •Linearity restricts the parameters, not the functional form of the variables
Quick check
In the model , what does represent?
Which statement about the linearity of the simple regression model is correct?
Connected topics
Sources
- Wooldridge (2019), Ch. 2.1Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019. ISBN 978-1-337-55886-0.Section 2.1 defines the simple regression model, the interpretation of the slope and intercept, and the role of the error term.
- Wooldridge (2019), §2.1 (linearity)Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage Learning, 2019.Discusses why the model is linear in parameters and how transformations expand the class of relationships it can capture.