Doing Econometrics in Stata
Stata is the software backbone of this course. Real work lives in a do-file, a script that makes every step reproducible and reruns the whole analysis with one click. The core estimation command is `regress y x`, and its output table reports the coefficient, standard error, t statistic, p-value, and R-squared. Adding the `, robust` option swaps in heteroskedasticity-robust standard errors that stay valid when the error variance is not constant.
Why it matters
Clicking menus feels fast but cannot be reproduced or checked, so always write a do-file. When you read the regression output, the coefficient is the estimated effect, the standard error is its margin of uncertainty, and the t and p tell you whether it is distinguishable from zero. Reach for `, robust` by default in cross-section work, since constant error variance rarely holds in practice.
Formulas
Worked examples
Estimate the return to schooling with robust standard errors and read off the key numbers.
In a do-file run `regress wage educ, robust`. The `educ` row shows the coefficient (estimated effect of a year of schooling), its robust standard error, the t statistic and p-value for , and a 95% confidence interval. The header reports the R-squared, the fraction of wage variation the model explains. Saving this as a do-file means anyone can reproduce the result exactly.
Common mistakes
- ✗Pointing and clicking through menus is fine for real analysis. Menu commands are not reproducible or reviewable, so serious work belongs in a do-file that can be rerun and audited.
- ✗The `, robust` option changes the coefficients. Robust standard errors leave the coefficient estimates unchanged and only adjust the standard errors, t statistics, and p-values.
- ✗A high R-squared means the model is correct or causal. R-squared measures fit, not validity; a well-identified model can have a low R-squared and a misleading model a high one.
- ✗A small p-value means a large or important effect. The p-value speaks to statistical significance, not magnitude; you read economic importance from the coefficient and its units.
Revision bullets
- •Do-files make the whole analysis reproducible
- •`regress y x` estimates a regression by OLS
- •Output: coefficient, std. error, t, p-value, R-squared
- •t equals coefficient divided by its standard error
- •`, robust` gives heteroskedasticity-robust standard errors
Quick check
In Stata output, the t statistic for a coefficient is computed as
Adding the `, robust` option to `regress` changes
Connected topics
Sources
- Wooldridge (2019), Ch. 2, 8Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.Chapter 2 introduces OLS output; Chapter 8 motivates heteroskedasticity-robust standard errors.
- Stata Base Reference: regressStataCorp. Stata 18 Base Reference Manual, entry for regress. StataCorp LLC, 2023.Documents the regress command, its output table, and the robust (Huber/White) variance estimator.