Foundations & Databeginner

Doing Econometrics in Stata

Stata is the software backbone of this course. Real work lives in a do-file, a script that makes every step reproducible and reruns the whole analysis with one click. The core estimation command is `regress y x`, and its output table reports the coefficient, standard error, t statistic, p-value, and R-squared. Adding the `, robust` option swaps in heteroskedasticity-robust standard errors that stay valid when the error variance is not constant.

Why it matters

Clicking menus feels fast but cannot be reproduced or checked, so always write a do-file. When you read the regression output, the coefficient is the estimated effect, the standard error is its margin of uncertainty, and the t and p tell you whether it is distinguishable from zero. Reach for `, robust` by default in cross-section work, since constant error variance rarely holds in practice.

Formulas

t statistic in the output table

t = \frac{\hat\beta_1}{\mathrm{se}(\hat\beta_1)}

Stata divides each coefficient by its standard error to form the reported

t

. A larger absolute

t

gives a smaller p-value against

H_0\!: \beta_1 = 0

Worked examples

Scenario

Estimate the return to schooling with robust standard errors and read off the key numbers.

Solution

In a do-file run `regress wage educ, robust`. The `educ` row shows the coefficient (estimated effect of a year of schooling), its robust standard error, the t statistic and p-value for $H_0\!: \beta_1 = 0$ , and a 95% confidence interval. The header reports the R-squared, the fraction of wage variation the model explains. Saving this as a do-file means anyone can reproduce the result exactly.

Common mistakes

✗Pointing and clicking through menus is fine for real analysis. Menu commands are not reproducible or reviewable, so serious work belongs in a do-file that can be rerun and audited.
✗The `, robust` option changes the coefficients. Robust standard errors leave the coefficient estimates unchanged and only adjust the standard errors, t statistics, and p-values.
✗A high R-squared means the model is correct or causal. R-squared measures fit, not validity; a well-identified model can have a low R-squared and a misleading model a high one.
✗A small p-value means a large or important effect. The p-value speaks to statistical significance, not magnitude; you read economic importance from the coefficient and its units.

Revision bullets

•Do-files make the whole analysis reproducible
•`regress y x` estimates a regression by OLS
•Output: coefficient, std. error, t, p-value, R-squared
•t equals coefficient divided by its standard error
•`, robust` gives heteroskedasticity-robust standard errors

Quick check

In Stata output, the t statistic for a coefficient is computed as

Adding the `, robust` option to `regress` changes

Connected topics

Data Types Research Process OLS Derivation R-squared Robust SE

Sources

Wooldridge (2019), Ch. 2, 8
Wooldridge, J. M. Introductory Econometrics: A Modern Approach. 7th ed. Cengage, 2019. ISBN 978-1-337-55886-0.
Chapter 2 introduces OLS output; Chapter 8 motivates heteroskedasticity-robust standard errors.
Stata Base Reference: regress
StataCorp. Stata 18 Base Reference Manual, entry for regress. StataCorp LLC, 2023.
Documents the regress command, its output table, and the robust (Huber/White) variance estimator.

How to cite this page

Dr. Phil's Quant Lab. (2026). Doing Econometrics in Stata. Derivatives Atlas. https://phucnguyenvan.com/concept/efm-stata-workflow

← Back to the atlas See in the network →