Dummy Variables
A dummy (binary) variable takes the value 1 for one category and 0 otherwise, letting qualitative information enter a regression. In , the coefficient shifts the intercept, measuring the average gap in between the group with and the omitted base group, holding fixed. With categories you include dummies and leave one out as the reference. Including all dummies plus an intercept causes perfect collinearity, the dummy-variable trap.
Try it yourself
One model, many specifications. See how functional form, a squared term, and a dummy variable each change what a coefficient means and how the fitted line bends. The same seeded data sit under all three views.
group D: ŷ = 15.0 + 1.05·x
δ₀ shifts the intercept by 7.0. With the interaction off the slopes match, so the lines stay parallel.
Discussion. With the interaction off, δ₀ only lifts the line; turn it on and δ₁ tilts the slope. Which question does each parameter answer, and why must the base group and the main effect stay in the model for δ₀ and δ₁ to be readable?
y = β₀ + δ₀D + β₁x + δ₁(D·x) + u. The base group (D = 0) has intercept β₀ and slope β₁; group D has intercept β₀ + δ₀ and slope β₁ + δ₁. With the interaction off, δ₁ = 0 and the lines are parallel.
Why it matters
Regression needs numbers, but many things we care about are labels, such as married or single, union or non-union, or one of four regions. A dummy converts a label into a 0/1 switch. The coefficient then reads as "how much higher or lower is for this group compared with the left-out group, on average." You always need one group to compare against, which is why one category is dropped rather than coded.
Formulas
Worked examples
Estimate the wage gap associated with being female, controlling for education and experience.
Run `regress lwage female educ exper`. The coefficient on `female` is the average percent wage gap (since is logged) relative to men, the base group, holding education and experience fixed. A value of about -0.18 implies women earn roughly 18 percent less on average for the same measured characteristics.
Region has four categories (north, south, east, west) and you want regional wage differences.
Run `regress lwage i.region educ`. Stata automatically drops one region as the base and reports three coefficients, each the mean log-wage difference from that omitted region. Trying to force all four region dummies with a constant would trigger the dummy-variable trap and Stata would drop one for you.
Common mistakes
- ✗Including a dummy for every category plus an intercept. That is the dummy-variable trap. The dummies sum to one and are perfectly collinear with the constant. Drop one category.
- ✗Reading a dummy coefficient in absolute terms. With on the left, the coefficient is approximately a percent difference, and for larger values is the exact percent gap.
- ✗Thinking the choice of base group changes the substance. It only changes which comparisons the coefficients show; predictions and fit are identical.
- ✗Treating a dummy coefficient as causal. It is a conditional mean difference and can still reflect omitted variables correlated with group membership.
Revision bullets
- •A dummy is 0/1 and shifts the intercept by its coefficient.
- •The coefficient is the mean gap from the omitted base group.
- •Use dummies for categories.
- •All dummies plus a constant cause the dummy-variable trap.
- •In Stata, `i.var` factor notation handles the base level automatically.
Quick check
With four regions you should include how many region dummies alongside the intercept?
In `regress lwage female educ`, a coefficient of -0.20 on female means:
Connected topics
Sources
- Wooldridge (2019), §7.1-7.3Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Introduces binary regressors, intercept shifts, multiple categories, and the dummy-variable trap.
- Wooldridge (2019), Ch. 7Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Worked wage examples interpreting dummy coefficients as conditional mean differences from the base group.