Skip to content
Panel Dataintermediate

Panel Data: Cross-Section meets Time Series

Panel (longitudinal) data follow the same N entities (firms, people, countries) across T time periods, so every observation carries an entity index ii and a time index tt. A panel is balanced when every entity is seen in every period and unbalanced when some entity-periods are missing. The key idea is that total variation splits into between variation (how entities differ from each other on average) and within variation (how one entity changes over time). The workhorse model is the unobserved-effects model yit=xitβ+ai+uity_{it}=x_{it}^{\prime}\beta+a_i+u_{it}, where aia_i is a time-invariant entity effect (unmeasured firm quality, manager ability, culture) and uitu_{it} is the idiosyncratic error. The reason panel dominates project econometrics is that it can hold aia_i fixed, using each entity as its own control, which a single cross-section cannot do.

Watch the lesson

Modelling· 5:41· ECON3006

Panel Data, where cross-section meets time series

Open full lesson page →

Why it matters

A cross-section is one snapshot, so any unmeasured trait that makes a firm both profitable and well-governed gets baked into the error and biases the slope. Watching the same firm across several years lets you ask a sharper question: when this firm changes its leverage, what happens to its own profitability. Because the firm is compared to itself, everything fixed about it (its sector, its founding culture, its location) cancels out. That is why panel data are the practical engine for controlling the time-invariant confounders that a single survey wave leaves trapped in the error term.

Formulas

Unobserved-effects model
yit=xitβ+ai+uit,i=1,,N,  t=1,,Ty_{it}=x_{it}^{\prime}\beta+a_i+u_{it}, \quad i=1,\dots,N,\; t=1,\dots,T
Identification needs strict exogeneity E(uitxi1,,xiT,ai)=0E(u_{it}\mid x_{i1},\dots,x_{iT},a_i)=0: the idiosyncratic error is mean-independent of the regressors in every period and of aia_i. The fixed effect aia_i is allowed to correlate with xitx_{it}.
Within and between decomposition
xit=xˉi+(xitxˉi),xˉi=1Tt=1Txitx_{it}=\bar{x}_i+(x_{it}-\bar{x}_i), \quad \bar{x}_i=\tfrac{1}{T}\sum_{t=1}^{T}x_{it}
The entity mean xˉi\bar{x}_i carries the between variation (differences across entities) and the deviation xitxˉix_{it}-\bar{x}_i carries the within variation (movement over time for one entity). A regressor with no within variation cannot be identified by fixed effects.

Worked examples

Scenario

Declare a firm-year dataset as a panel and inspect where the variation in return on assets comes from.

Solution

Run `xtset id year` to declare the panel, then `xtsum roa`. Stata reports an overall standard deviation plus a between SD (across firms) and a within SD (over time within a firm). If the within SD is small relative to the between SD, most differences in `roa` are permanent firm traits, so a time-varying regressor will have little within signal to explain.

NoteUse `xtdescribe` to see the time pattern and confirm whether the panel is balanced.
Scenario

A wage panel where workers enter and leave the sample in different years.

Solution

After `xtset pid year`, run `xtdescribe`. The output flags an unbalanced panel because the number of periods per worker varies. This is fine for fixed-effects and random-effects estimation as long as the reason a worker is missing is unrelated to the error, otherwise the missingness is itself a source of bias rather than a harmless gap.

Common mistakes

  • Thinking pooled data and panel data are the same. Pooling stacks observations and ignores the entity index, so it cannot separate within from between variation or sweep out aia_i.
  • Believing fixed effects can estimate the effect of a time-invariant regressor such as gender or country of birth. Those variables have zero within variation, so their coefficients are absorbed by aia_i.
  • Assuming an unbalanced panel is automatically biased. Unbalancedness is harmless if the missingness is unrelated to the error. The danger is selection, when entities drop out for reasons tied to the outcome.
  • Reading strict exogeneity as the weaker zero-conditional-mean from cross-section. Strict exogeneity requires uitu_{it} to be uncorrelated with regressors in all periods, not just the current one, which rules out feedback from past shocks to future xx.

Revision bullets

  • Panel = same N entities tracked over T periods, each row indexed by (i,t)(i,t).
  • Balanced = every entity in every period; unbalanced = some entity-periods missing.
  • Between variation is across entities, within variation is over time for one entity.
  • Unobserved-effects model: yit=xitβ+ai+uity_{it}=x_{it}^{\prime}\beta+a_i+u_{it} with aia_i time-invariant.
  • Strict exogeneity conditions on regressors in every period, not just period tt.
  • Declare the panel with `xtset id year`; inspect variation with `xtsum` and `xtdescribe`.

Quick check

In the model yit=xitβ+ai+uity_{it}=x_{it}^{\prime}\beta+a_i+u_{it}, what does aia_i represent?

You run `xtsum x` and find the within standard deviation is near zero. What follows?

Connected topics

Sources

  1. Wooldridge (2019), Ch. 13
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
    Chapter 13 introduces pooling independent cross-sections and panel data, the two-period unobserved-effects model, and the distinction between idiosyncratic and time-invariant errors.
  2. Wooldridge (2019), Ch. 14
    Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
    Chapter 14 develops the general unobserved-effects model, the strict-exogeneity assumption, and the within transformation that underlies fixed-effects estimation.
How to cite this page
Dr. Phil's Quant Lab. (2026). Panel Data: Cross-Section meets Time Series. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-panel-data-structure
Next concept
Types of Economic Data
Share this page
Built by Dr. Phuc V. Nguyen ·Follow on LinkedInWork with PhilEmail