Clustered Standard Errors
In a panel model , the idiosyncratic errors for a given entity are almost always serially correlated over time, and often heteroskedastic too. That breaks the assumptions behind default OLS standard errors, and even heteroskedasticity-robust (White) standard errors stay invalid because they still treat each observation as independent. Invalid here means the reported standard errors can point the wrong way, not merely that they are understated. The fix is to cluster the variance-covariance estimator on the entity, allowing arbitrary correlation of the errors within each entity while assuming independence across entities. In Stata: `xtreg y x, fe vce(cluster id)` or `regress y x, vce(cluster id)`.
Why it matters
Picture each firm watched for ten years. A firm with a good manager tends to do well year after year, so its errors move together across time and the ten observations carry far less independent information than ten unrelated draws would. Default standard errors count all of them as fresh evidence and overstate how much you know. Clustering on the firm says treat each firm as one correlated block and lets the within-firm dependence take any shape. Crucially this changes inference only. The slope estimates do not move, and a wider standard error is honesty about precision, not a different answer about the effect.
Formulas
Worked examples
Run the same fixed-effects wage regression twice, once with default and once with clustered standard errors, on a worker-by-year panel.
First `xtset id year`, then `xtreg lwage union exper, fe` followed by `xtreg lwage union exper, fe vce(cluster id)`. The coefficient on `union` is identical across the two runs, say 0.085. What changes is its standard error, which rises from roughly 0.018 to roughly 0.030, so the t-statistic falls from about 4.7 to about 2.8. The clustered version is the honest one because each worker is followed over several years.
A pooled regression of firm investment on cash flow across many firms over time.
Run `regress invest cashflow size, vce(cluster firmid)`. The slope on `cashflow` is unchanged from the default `regress invest cashflow size`, but the clustered standard error is usually larger because each firm contributes serially correlated errors. Always confirm the design has many firms: with only a handful of clusters the cluster-robust formula is unreliable and the reported significance cannot be trusted.
Common mistakes
- ✗Thinking clustering changes the coefficients. It does not. The point estimates are numerically identical to the non-clustered run, because only the variance-covariance matrix is recomputed.
- ✗Believing clustered standard errors are always larger. They usually grow when within-entity errors are positively correlated, which is the common case, but this is not guaranteed and they can occasionally shrink.
- ✗Treating clustering as a fix for bias or endogeneity. It corrects inference only. It does nothing about omitted-variable bias, simultaneity, or measurement error, so a biased estimate stays biased with a clustered standard error attached.
- ✗Clustering when there are very few entities. The asymptotics run in the number of clusters , not the number of observations, so with only a handful of clusters the standard errors are unreliable no matter how many rows the dataset has.
Revision bullets
- •Panel errors are usually serially correlated within an entity, so default and White standard errors are invalid.
- •Clustering on the entity allows arbitrary within-entity correlation while assuming independence across entities.
- •Clustering changes inference only, the coefficients are unchanged.
- •Cluster-robust inference needs many clusters, the asymptotics are in , not in the number of observations.
- •It assumes no cross-cluster correlation. Common shocks across entities in the same year may need time effects or two-way clustering.
- •Stata: `xtreg y x, fe vce(cluster id)` or `regress y x, vce(cluster id)`.
Quick check
You re-run a fixed-effects regression with `vce(cluster id)` instead of the default standard errors. What changes?
Why are heteroskedasticity-robust (White) standard errors still invalid for a typical panel?
Connected topics
Sources
- Wooldridge (2019), Ch. 14Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Chapter 14 develops fixed and random effects for panel data and motivates cluster-robust inference that allows arbitrary serial correlation within an entity.
- Wooldridge (2019), Ch. 12Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019.Chapter 12 covers serial correlation and why standard errors that ignore it are invalid, the time-series counterpart of the within-entity dependence that clustering addresses.