This homework builds on what we studied in class. We are going to simulate from the very simple model of labor supply we considered.

The agent problem is

\[ \max_{c,h,e} c - \beta \frac{h^{1+\gamma}}{1+\gamma}\\ \text{s.t. } c = e \cdot \rho \cdot w\cdot h +(1-e)\cdot r \cdot h \] The individual takes his wage \(w\) as given, he chooses hours of work \(h\) and consumption \(c\). He also chooses whether to work in the labor force or to work at home where he has an equivalent wage \(r\).

Question 1 Do we expect to see any income effect given our model? What if we substituted \(c\) in the utility for \(\frac{c^{1+\eta}}{1+\eta}\)?

Simulating data

We are going to simulate a data set where agents will choose participation as well as the number of hours if they decide to work. This requires for us to specify how each of the individual specific variables are drawn. We then set the following:

\[ \begin{align*} \log W_i &= \eta X_i + Z_i + u_i \\ \log R_i &= \delta_0 + \log(W_i) + \delta Z_i + \xi_i \\ \log \beta_i &= X_i +\epsilon_i + a \xi_i \\ \end{align*} \]

and finally \((X_i,Z_i,\epsilon_i,u_i,\xi_i)\) are independent normal draws. Given all of this we can simulate our data.

Question 2 What does the \(a\) parameter capture here?

p  = list(gamma = 0.8,beta=1,a=1,rho=1,eta=0.2,delta=-0.2,delta0=-0.1,nu=0.5) # parameters
N=10000  # size of the simulation
simdata = data.table(i=1:N,X=rnorm(N))

# simulating variables
simdata[,X := rnorm(N)]
simdata[,Z := rnorm(N)]
simdata[,u := rnorm(N)]
simdata[,lw := p$eta*X  + Z + 0.2*u ]  # log wage

simdata[,xi := rnorm(N)*0.2]
simdata[,lr := lw + p$delta0+ p$delta*Z + xi]; # log home productivity

simdata[,eps:=rnorm(N)*0.2]
simdata[,beta := exp(p$nu*X  + p$a*xi + eps)]; # heterogenous beta coefficient

# compute decision variables
simdata[, lfp := log(p$rho) + lw >= lr] # labor force participation
simdata[, h   := (p$rho * exp(lw)/beta)^(1/p$gamma)] # hours
simdata[lfp==FALSE,h:=NA][lfp==FALSE,lw:=NA]
simdata[,mean(lfp)]

We have now our simulated data.

Question 3 Simulate data with \(a=0\) and \(a=1\). Comment on the value of the coefficient of the regression of log hours on log wage and X.

Heckman correction

As we have seen in class, Heckman (74) offers a way for us to correct the our regression in order to recover our structural parameters.

As we have seen in class, we need to understand how the error term in the hour regression correlates with the labor participation decision.

Question 4 Following what we did in class, and using the class note, derive the expression for the Heckman correction term as a function of known parameters. In other words, derive \(E( a \xi_i + \epsilon_i | lfp=1)\).

Construction of this epxression requires us to recover the parameters \(\delta/\sigma_xi,\delta_0/\sigma_xi\). We can get these by running a probit of participation on \(Z_i\).

fit2 = glm(lfp ~ Z,simdata,family = binomial(link = "probit"))

Question 5 Check that the regression does recover correctly the coefficients. Use them to construct the inverse Mills ratio. Use the correction you created and show that the regression with this extra term delivers the correct estimates for \(\gamma\) even in the case where \(a\neq 0\).

Repeated cross-section

Lastly we want to replicate the approach of Blundell, Duncan and Meghir. To justify such an appraoach we are going to include an additional endogeneity concern between the wage and the disutility of hours of worked. We want to do the following:

  1. add the wage residual \(u_i\) inside the expression for \(\beta_i\) (similar to the \(\xi\) term)
  2. simulate 2 data-sets (two different cross-sections, redraw everything). However in the second cross-section change the \(rho\) to 1.2

Our final step is then to try to recover the wage elasticty by differencing across periods usig the tax variation. To do so, we need to compute time specific Mills ratios.

Question 6 Why do we need to estimate the parameters of the selection equation separatly for each period?

Question 7 Create the heckman correction term for each observation in each period. Then cut the X into a few values (picking some threshold). Finally compute all first difference in the time dimension (including the mills ratio difference). Finally run the regression using the different group as obervations and the difference as variables. Do this allow to recover the correct \(\gamma\)?

Fork me on GitHub