In the model y_it=x_it^primeβ+a_i+u_it, what does a_i represent?

A time-invariant unobserved trait of entity i. The term a_i carries an i subscript but no t subscript, so it is fixed over time for a given entity and captures unmeasured permanent traits such as firm quality or ability. The period-varying shock is u_it, which carries both subscripts. Panel methods are valuable precisely because they can control for a_i even when it correlates with the regressors.

Panel Dataintermediate

Panel Data: Cross-Section meets Time Series

Q: You run xtsum x and find the within standard deviation is near zero. What follows?

The variable x has almost no variation over time within entities. The within standard deviation in xtsum measures how much a variable moves over time for a given entity. If it is near zero, x is almost constant within each entity and nearly all of its spread is between entities. A fixed-effects estimator relies on within variation, so it will struggle to identify the coefficient on a regressor with little within movement.

Panel (longitudinal) data follow the same N entities (firms, people, countries) across T time periods, so every observation carries an entity index $i$ and a time index $t$ . A panel is balanced when every entity is seen in every period and unbalanced when some entity-periods are missing. The key idea is that total variation splits into between variation (how entities differ from each other on average) and within variation (how one entity changes over time). The workhorse model is the unobserved-effects model $y_{it}=x_{it}^{\prime}\beta+a_i+u_{it}$ , where $a_i$ is a time-invariant entity effect (unmeasured firm quality, manager ability, culture) and $u_{it}$ is the idiosyncratic error. The reason panel dominates project econometrics is that it can hold $a_i$ fixed, using each entity as its own control, which a single cross-section cannot do.

Watch the lesson

Modelling· 5:41· ECON3006

Panel Data, where cross-section meets time series

Open full lesson page →

Why it matters

A cross-section is one snapshot, so any unmeasured trait that makes a firm both profitable and well-governed gets baked into the error and biases the slope. Watching the same firm across several years lets you ask a sharper question: when this firm changes its leverage, what happens to its own profitability. Because the firm is compared to itself, everything fixed about it (its sector, its founding culture, its location) cancels out. That is why panel data are the practical engine for controlling the time-invariant confounders that a single survey wave leaves trapped in the error term.

Formulas

Unobserved-effects model

y_{it}=x_{it}^{\prime}\beta+a_i+u_{it}, \quad i=1,\dots,N,\; t=1,\dots,T

Identification needs strict exogeneity

E(u_{it}\mid x_{i1},\dots,x_{iT},a_i)=0

: the idiosyncratic error is mean-independent of the regressors in every period and of

a_i

. The fixed effect

a_i

is allowed to correlate with

x_{it}

Within and between decomposition

x_{it}=\bar{x}_i+(x_{it}-\bar{x}_i), \quad \bar{x}_i=\tfrac{1}{T}\sum_{t=1}^{T}x_{it}

The entity mean

\bar{x}_i

carries the between variation (differences across entities) and the deviation

x_{it}-\bar{x}_i

carries the within variation (movement over time for one entity). A regressor with no within variation cannot be identified by fixed effects.

Worked examples

Scenario

Declare a firm-year dataset as a panel and inspect where the variation in return on assets comes from.

Solution

Run `xtset id year` to declare the panel, then `xtsum roa`. Stata reports an overall standard deviation plus a between SD (across firms) and a within SD (over time within a firm). If the within SD is small relative to the between SD, most differences in `roa` are permanent firm traits, so a time-varying regressor will have little within signal to explain.

NoteUse `xtdescribe` to see the time pattern and confirm whether the panel is balanced.

Scenario

A wage panel where workers enter and leave the sample in different years.

Solution

After `xtset pid year`, run `xtdescribe`. The output flags an unbalanced panel because the number of periods per worker varies. This is fine for fixed-effects and random-effects estimation as long as the reason a worker is missing is unrelated to the error, otherwise the missingness is itself a source of bias rather than a harmless gap.

Common mistakes

✗Thinking pooled data and panel data are the same. Pooling stacks observations and ignores the entity index, so it cannot separate within from between variation or sweep out $a_i$ .
✗Believing fixed effects can estimate the effect of a time-invariant regressor such as gender or country of birth. Those variables have zero within variation, so their coefficients are absorbed by $a_i$ .
✗Assuming an unbalanced panel is automatically biased. Unbalancedness is harmless if the missingness is unrelated to the error. The danger is selection, when entities drop out for reasons tied to the outcome.
✗Reading strict exogeneity as the weaker zero-conditional-mean from cross-section. Strict exogeneity requires $u_{it}$ to be uncorrelated with regressors in all periods, not just the current one, which rules out feedback from past shocks to future $x$ .

Revision bullets

•Panel = same N entities tracked over T periods, each row indexed by $(i,t)$ .
•Balanced = every entity in every period; unbalanced = some entity-periods missing.
•Between variation is across entities, within variation is over time for one entity.
•Unobserved-effects model: $y_{it}=x_{it}^{\prime}\beta+a_i+u_{it}$ with $a_i$ time-invariant.
•Strict exogeneity conditions on regressors in every period, not just period $t$ .
•Declare the panel with `xtset id year`; inspect variation with `xtsum` and `xtdescribe`.

Quick check

In the model $y_{it}=x_{it}^{\prime}\beta+a_i+u_{it}$ , what does $a_i$ represent?

You run `xtsum x` and find the within standard deviation is near zero. What follows?

Connected topics

Data Types Omitted var bias Pooled OLS Fixed Effects Random Effects

Sources

Wooldridge (2019), Ch. 13
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
Chapter 13 introduces pooling independent cross-sections and panel data, the two-period unobserved-effects model, and the distinction between idiosyncratic and time-invariant errors.
Wooldridge (2019), Ch. 14
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.
Chapter 14 develops the general unobserved-effects model, the strict-exogeneity assumption, and the within transformation that underlies fixed-effects estimation.

How to cite this page

Dr. Phil's Quant Lab. (2026). Panel Data: Cross-Section meets Time Series. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-panel-data-structure