Skip to content

Doing Econometrics in Stata

Stata is the software backbone of this course. Real work lives in a do-file, a script that makes every step reproducible and reruns the whole analysis with one click. The core estimation command is `regress y x`, and its output table reports the coefficient, standard error, t statistic, p-value, and R-squared. Adding the `, robust` option swaps in heteroskedasticity-robust standard errors that stay valid when the error variance is not constant.

Why it matters

Clicking menus feels fast but cannot be reproduced or checked, so always write a do-file. When you read the regression output, the coefficient is the estimated effect, the standard error is its margin of uncertainty, and the t and p tell you whether it is distinguishable from zero. Reach for `, robust` by default in cross-section work, since constant error variance rarely holds in practice.

Formulas

t statistic in the output table
t=β^1se(β^1)t = \frac{\hat\beta_1}{\mathrm{se}(\hat\beta_1)}
Stata divides each coefficient by its standard error to form the reported tt. A larger absolute tt gives a smaller p-value against H0 ⁣:β1=0H_0\!: \beta_1 = 0.

Worked examples

Scenario

Estimate the return to schooling with robust standard errors and read off the key numbers.

Solution

In a do-file run `regress wage educ, robust`. The `educ` row shows the coefficient (estimated effect of a year of schooling), its robust standard error, the t statistic and p-value for H0 ⁣:β1=0H_0\!: \beta_1 = 0, and a 95% confidence interval. The header reports the R-squared, the fraction of wage variation the model explains. Saving this as a do-file means anyone can reproduce the result exactly.

Common mistakes

  • Pointing and clicking through menus is fine for real analysis. Menu commands are not reproducible or reviewable, so serious work belongs in a do-file that can be rerun and audited.
  • The `, robust` option changes the coefficients. Robust standard errors leave the coefficient estimates unchanged and only adjust the standard errors, t statistics, and p-values.
  • A high R-squared means the model is correct or causal. R-squared measures fit, not validity; a well-identified model can have a low R-squared and a misleading model a high one.
  • A small p-value means a large or important effect. The p-value speaks to statistical significance, not magnitude; you read economic importance from the coefficient and its units.

Revision bullets

  • Do-files make the whole analysis reproducible
  • `regress y x` estimates a regression by OLS
  • Output: coefficient, std. error, t, p-value, R-squared
  • t equals coefficient divided by its standard error
  • `, robust` gives heteroskedasticity-robust standard errors

Quick check

In Stata output, the t statistic for a coefficient is computed as

Adding the `, robust` option to `regress` changes

Connected topics

Sources

  1. Wooldridge (2019), Ch. 2, 8
    Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
    Chapter 2 introduces OLS output; Chapter 8 motivates heteroskedasticity-robust standard errors.
  2. Stata Base Reference: regress
    StataCorp. Stata 18 Base Reference Manual, entry for regress. StataCorp LLC, 2023.
    Documents the regress command, its output table, and the robust (Huber/White) variance estimator.
How to cite this page
Dr. Phil's Quant Lab. (2026). Doing Econometrics in Stata. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-stata-workflow