Specification & Data Problemsintermediate

The Linear Probability Model

When the dependent variable is binary, OLS on that 0/1 outcome is the linear probability model (LPM). Because $E[y\mid \mathbf{x}]=P(y=1\mid \mathbf{x})$ , each slope is the change in the probability of success for a one-unit change in a regressor. The LPM is easy to estimate and read, but it has two flaws: fitted values can fall outside $[0,1]$ , and the error is inherently heteroskedastic with variance $p(\mathbf{x})\,[1-p(\mathbf{x})]$ . It is the entry point to limited dependent variable models such as logit and probit.

Why it matters

Sometimes the outcome is yes or no, such as in the labor force or not, or a loan approved or not. Coding it 0/1 and running OLS gives coefficients you can read straight off as "this raises the chance of a yes by so many percentage points," which is wonderfully concrete. The catch is that a straight line does not respect the 0-to-1 fence, so for extreme inputs it can predict a probability above one or below zero, which is nonsense and warns you the line is only a local approximation.

Formulas

Probability interpretation

P(y=1\mid \mathbf{x})=\beta_0+\beta_1 x_1+\dots+\beta_k x_k

Each

\beta_j

is the change in the probability of success per one-unit rise in

x_j

, holding the others fixed.

Heteroskedastic error variance

\operatorname{Var}(u\mid \mathbf{x})=p(\mathbf{x})\,[1-p(\mathbf{x})]

The variance depends on

\mathbf{x}

by construction, so heteroskedasticity is built in. Always report robust standard errors.

Worked examples

Scenario

Model whether a married woman is in the labor force as a function of education and number of young children.

Solution

Run `regress inlf educ kidslt6, robust`. A coefficient of about 0.038 on `educ` means each extra year of schooling raises the probability of being in the labor force by about 3.8 percentage points. The `robust` option corrects the standard errors for the built-in heteroskedasticity.

NoteCheck fitted values with `predict phat` then `summarize phat`; some may lie below 0 or above 1.

Scenario

Estimate the probability that a mortgage application is approved given the applicant’s debt-to-income ratio.

Solution

Run `regress approve dti, robust`. The slope is the change in approval probability per unit of debt-to-income ratio. For very low or very high ratios the predicted probability can exceed one or drop below zero, a reminder that logit or probit may fit the tails better while the LPM still gives a clear average marginal effect.

Common mistakes

✗Trusting fitted probabilities outside $[0,1]$ . The linear form is not bounded, so out-of-range predictions occur and should not be read as real probabilities.
✗Using default OLS standard errors. The LPM error is heteroskedastic by construction, so robust standard errors are required for valid inference.
✗Believing the marginal effect is constant everywhere in a meaningful sense. The LPM imposes a constant effect, which is exactly where it can mislead near the boundaries; logit and probit let the effect taper.
✗Thinking logit or probit are always strictly better. The LPM often delivers very similar average partial effects and is simpler to interpret, which is why it remains widely used.

Revision bullets

•LPM is OLS on a binary $y$ ; slopes are changes in $P(y=1)$ .
•Fitted values can fall outside $[0,1]$ .
•Errors are heteroskedastic with variance $p(1-p)$ , so use `robust`.
•Coefficients read as changes in probability (percentage points).
•Gateway to limited dependent variable models (logit, probit).

Quick check

In an LPM `regress inlf educ`, a coefficient of 0.04 on educ means an extra year of schooling:

Which is a genuine drawback of the LPM?

Connected topics

MLR model MLR assumptions Dummies Heteroskedastic Robust SE

Sources

Wooldridge (2019), §7.5
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
Defines the linear probability model, the probability interpretation of coefficients, and its limitations.
Wooldridge (2019), §8.5
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
Shows the LPM error is heteroskedastic with variance $p(1-p)$ and motivates robust inference.

How to cite this page

Dr. Phil's Quant Lab. (2026). The Linear Probability Model. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-linear-probability-model

← Back to the atlas See in the network →