Question 1 Show that the univariate ols model \(y_i = \beta x_i + \epsilon_i\) is identified when \(E(x\epsilon)=0\) and \(var(x)>0\).
Question 2 Derive the bias of the sample variance.
require(data.table)
require(kableExtra)
require(ggplot2)
require(texreg)
require(readstata13) #to install, run install.packages("readstata13")
require(sandwich)
options(knitr.table.format = "html")
Here is some relevant material:
To install these packages
install.packages("lmtest")
install.packages("sandwich")
We are going to reproduce an exercise similar to the example for the computation of standard error. Start by downloading the CPS data from here. We first load the data into R.
# replace this with the path to your download folder
data = read.dta13("../data/CPS_2012_micro.dta")
data = data.table(data)
data$age = as.numeric(data$age)
Next generate a fictuous policy that you randomly assigned at the state times gender level. Run the regression and report standard errors given by R for one draw of the poilcy.
set.seed(60356548) # I fix the seed to make sure the draws are reproducible
data <- data[,fp := runif(1)>0.5, statefip]
fit1 = lm(lnwage ~fp,data)
htmlreg(fit1,single.row=TRUE)
Model 1 | |
---|---|
(Intercept) | 2.68 (0.00)^{***} |
fpTRUE | -0.02 (0.00)^{**} |
R^{2} | 0.00 |
Adj. R^{2} | 0.00 |
Num. obs. | 65685 |
^{}p < 0.001; ^{}p < 0.01; ^{}p < 0.05 |
Note We do not control for state specific fixed effect as these would would be perfectly colinear with the policy.
Now this is surprising. We generated fp
randomly across states and so we should have that when the number of states becomes very large \(E(\epsilon_i fp_i)=0\). To gain understanding on what is happening we will generate our own data in a way where we control exactly what is happening.
Let’s start by reassuring ourselves. Let’s use an IID data generating process (DGP), run the regression and check the significance.
lnwage
in the sample. This is an estimate of our homoskedastic error.y2
by adding to fp
a normal error with the estimated variance, and truly independent across individuals. Use y2:=rnorm(.N)*var_est
inside your data.table data.y2
on fp
, our fictuous policy and collect the coefficient, also save if the coefficient is significant at 5%.Question 3 Follow the previous steps and report the rejection rate of the test on fp
. You should find something close to 5% and you should feel better!
Now we want to compute heteroskedastic robust standard errors which requires us to use some co-variates. We then want to repeat the previous procedure, but we are going to use a different test for the significance. We then want to construct our variance co-variance matrix using the following formula:
\[ V =(X'X)^{-1} X' \Omega X' (X'X)^{-1} \] where \(\Omega = diag \{ \epsilon_i^2 \}\). Using vcovHC with type type="const"
and type="HC0"
will do that for you!
We want to check this by simulating from a model with heteroskedesatic errors. To do so we are going to use linear model for the variance.
lnwage ~ yrseduc + age + I(age^2)
and regress the square of the residual on the same co-variates formula to get an estimate of the heteroskedastic variance.s
.pred
.s
and adding the pred
.fp
using vcovHC with type type="const"
and type="HC0"
.Question 4 Follow the steps and report the rejection rate for each of the variance evaluation.
We are again here going to try to simulate corrolated error within state. For this we pick a correlation parameter \(\rho\). Then, to simulate we are going to draw the first individual in an iid way, then using an auto-regressive structure to compute the error of the following people. Given \(\rho\) it can be done in the following way:
fit0 = lm(lnwage ~ yrseduc + age + I(age^2),data)
data <- data[,yhat := predict(fit0)]
rho = 0.8
data <- data[, res_hat := {
r = rep(0,.N)
r[1] = rnorm(1)
for (i in 2:.N) {
r[i] = rho*r[i-1] + rnorm(1)
}
r
},statefip]
data <- data[,y2:= yhat + res_hat]
data <- data[,fp := runif(1)>0.5, statefip]
fitn = lm(y2 ~ fp+yrseduc + age + I(age^2),data)
#summary(fitn)
htmlreg(fitn,single.row=TRUE,omit.coef="state")
Model 1 | |
---|---|
(Intercept) | -0.50 (0.09)^{***} |
fpTRUE | -0.03 (0.01)^{**} |
yrseduc | 0.10 (0.00)^{***} |
age | 0.07 (0.00)^{***} |
age^2 | -0.00 (0.00)^{***} |
R^{2} | 0.04 |
Adj. R^{2} | 0.04 |
Num. obs. | 65685 |
^{}p < 0.001; ^{}p < 0.01; ^{}p < 0.05 |
Question 5 Explain the expression that starts with data[, res_hat := {...
Question 6 For \(\rho=0.7,0.8,0.9\) run 500 replications and report the proportion at each value of replication for which the coefficient on our ficutous policy was significant at 5%.
We have not covered this in class yet, but one could instead try to resample the data.
Use the following procedure:
Note do not redraw fp
!
Question 7 Report the 0.05 and 0.095 quantiles for the regression coefficients. This is a test at 10%, does this interval include 0?