Econometrics estimates and tests economic relationships with data
Cross section: many units, one time period
Expected value is the population mean; variance measures spread
Population model: $y = \beta_0 + \beta_1 x + u$
Correlation is co-movement; causality is a true effect of changing $x$
Question, model, data, estimate, test, interpret
Do-files make the whole analysis reproducible
MLR: $y=\beta_0+\beta_1 x_1+\dots+\beta_k x_k+u$, one error term for all unobservables.
OLS minimizes $\sum \hat{u}_i^2$ and solves all $\hat{\beta}_j$ jointly.
OVB needs a relevant omitted variable that correlates with an included regressor.
MLR.1 linear in parameters, MLR.2 random sampling, MLR.3 no perfect collinearity, MLR.4 zero conditional mean.
MLR.5 adds homoskedasticity, $\text{Var}(u\mid x)=\sigma^2$.
Consistency: $\text{plim}\,\hat{\beta}_j=\beta_j$, the estimate converges to the truth as $n$ grows.
MLR.6 adds normal errors and defines the classical linear model.
t statistic: $t=(\hat{\beta}_j-\beta_{j,0})/\mathrm{se}(\hat{\beta}_j)$, usually with $\beta_{j,0}=0$.
CI: $\hat{\beta}_j \pm c\cdot\mathrm{se}(\hat{\beta}_j)$ with $c$ from the $t_{\,n-k-1}$ distribution.
F test: $F=[(SSR_r-SSR_{ur})/q]\,/\,[SSR_{ur}/(n-k-1)]$ for $q$ joint restrictions.
Distinguish predicting the mean $E(y\mid x)$ from predicting a single new $y$.
Log-log slope is an **elasticity** (percent per percent).
Quadratic slope is $\beta_1+2\beta_2 x$, not $\beta_1$.
A dummy is 0/1 and **shifts the intercept** by its coefficient.
Dummy times continuous lets the **slope** differ by group.
LPM is OLS on a binary $y$; slopes are changes in $P(y=1)$.
High correlation among regressors inflates $\operatorname{Var}(\hat{\beta}_j)$ via the **VIF**.
Dividing $x_j$ by $c$ multiplies $\hat{\beta}_j$ and its SE by $c$.
Wrong functional form biases coefficients and predictions.
Classical error in $y$: more noise, larger SEs, still unbiased.
Random missingness costs sample size; **systematic** missingness can bias.
A time series is **one realization** of a stochastic process
Static model assumes an **immediate** effect of $z$ on $y$
Trending series **drift** steadily over time
Seasonality is a **repeating calendar pattern** within the year
TS.1-TS.5 are the **Gauss-Markov** conditions for time series; TS.6 adds normality
Serial correlation = **errors correlated over time**
**Durbin-Watson** targets AR(1); $DW \approx 2(1-\hat{\rho})$
**Newey-West (HAC)** standard errors are the modern default fix
**Stationary + weakly dependent** is what OLS needs
Regressing independent $I(1)$ series gives a **spurious regression**
Definition: $\mathrm{Var}(u\mid x)$ depends on the regressors, violating MLR.5
Replace $\sigma^2$ with $\hat{u}_i^2$ in the variance formula (sandwich form)
Both tests use an auxiliary regression of $\hat{u}^2$ on explanatory terms
WLS weights by $\tfrac{1}{\sigma_i^2}$, transforming the error to be homoskedastic
Model is $y = \beta_0 + \beta_1 x + u$, linear in the parameters
OLS minimizes $\sum \hat{u}_i^2$, the sum of squared residuals
Fitted value $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ lies on the line
Decomposition $SST = SSE + SSR$ (total = explained + residual)
Assumption is $E(u \mid x) = 0$, the key identifying condition
Assumptions SLR.1 to SLR.4 give $E(\hat{\beta}_0) = \beta_0$ and $E(\hat{\beta}_1) = \beta_1$
SLR.5 homoskedasticity: $\mathrm{Var}(u \mid x) = \sigma^2$, constant error variance