Panel Data: Cross-Section meets Time Series
Panel (longitudinal) data follow the same N entities (firms, people, countries) across T time periods, so every observation carries an entity index and a time index . A panel is balanced when every entity is seen in every period and unbalanced when some entity-periods are missing. The key idea is that total variation splits into between variation (how entities differ from each other on average) and within variation (how one entity changes over time). The workhorse model is the unobserved-effects model , where is a time-invariant entity effect (unmeasured firm quality, manager ability, culture) and is the idiosyncratic error. The reason panel dominates project econometrics is that it can hold fixed, using each entity as its own control, which a single cross-section cannot do.
Watch the lesson
Open full lesson page →Why it matters
A cross-section is one snapshot, so any unmeasured trait that makes a firm both profitable and well-governed gets baked into the error and biases the slope. Watching the same firm across several years lets you ask a sharper question: when this firm changes its leverage, what happens to its own profitability. Because the firm is compared to itself, everything fixed about it (its sector, its founding culture, its location) cancels out. That is why panel data are the practical engine for controlling the time-invariant confounders that a single survey wave leaves trapped in the error term.
Formulas
Worked examples
Declare a firm-year dataset as a panel and inspect where the variation in return on assets comes from.
Run `xtset id year` to declare the panel, then `xtsum roa`. Stata reports an overall standard deviation plus a between SD (across firms) and a within SD (over time within a firm). If the within SD is small relative to the between SD, most differences in `roa` are permanent firm traits, so a time-varying regressor will have little within signal to explain.
A wage panel where workers enter and leave the sample in different years.
After `xtset pid year`, run `xtdescribe`. The output flags an unbalanced panel because the number of periods per worker varies. This is fine for fixed-effects and random-effects estimation as long as the reason a worker is missing is unrelated to the error, otherwise the missingness is itself a source of bias rather than a harmless gap.
Common mistakes
- ✗Thinking pooled data and panel data are the same. Pooling stacks observations and ignores the entity index, so it cannot separate within from between variation or sweep out .
- ✗Believing fixed effects can estimate the effect of a time-invariant regressor such as gender or country of birth. Those variables have zero within variation, so their coefficients are absorbed by .
- ✗Assuming an unbalanced panel is automatically biased. Unbalancedness is harmless if the missingness is unrelated to the error. The danger is selection, when entities drop out for reasons tied to the outcome.
- ✗Reading strict exogeneity as the weaker zero-conditional-mean from cross-section. Strict exogeneity requires to be uncorrelated with regressors in all periods, not just the current one, which rules out feedback from past shocks to future .
Revision bullets
- •Panel = same N entities tracked over T periods, each row indexed by .
- •Balanced = every entity in every period; unbalanced = some entity-periods missing.
- •Between variation is across entities, within variation is over time for one entity.
- •Unobserved-effects model: with time-invariant.
- •Strict exogeneity conditions on regressors in every period, not just period .
- •Declare the panel with `xtset id year`; inspect variation with `xtsum` and `xtdescribe`.
Quick check
In the model , what does represent?
You run `xtsum x` and find the within standard deviation is near zero. What follows?
Connected topics
Sources
- Wooldridge (2019), Ch. 13Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Chapter 13 introduces pooling independent cross-sections and panel data, the two-period unobserved-effects model, and the distinction between idiosyncratic and time-invariant errors.
- Wooldridge (2019), Ch. 14Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Chapter 14 develops the general unobserved-effects model, the strict-exogeneity assumption, and the within transformation that underlies fixed-effects estimation.