Instrumental-variable estimation: As explained, when we want to construe a regression causally, the most potentially problematic of the assumptions underlying the linear regression model is the assumption that the errors and explanatory variables are independent, because this assumption cannot be checked against the data. Consider the simple-regression model Y+ = βX+ þ ε, where, for simplicity, both Y+ [ Y ' E(Y) and X+ [ X ' E(X) are expressed as deviations from their expectations, so that E(Y+) = E(X+)= 0 and the intercept α is eliminated from the model.
(a) Suppose that X and ε are independent. Show that the ordinary least-squares estimator of β, BOLS ¼=SXY =S2
X (where SXY is the sample covariance of X and Y, and S2
X is the sample variance of X), can be derived by (1) multiplying the model through by X+, (2) taking the expectation of both sides of the resulting equation, and (3) substituting the sample variance and covariance for their population analogs. Because the sample variance and covariance are consistent estimators of the population variance and covariance, BOLS is a consistent estimator of β
(b) Now suppose that it is unreasonable to assume that X and ε are independent, but there is a third observed variable, Z, that is (1) independent of ε and (2) correlated with X. Z is called an instrumental variable (or an instrument). Proceeding in a manner similar to part (a), but multiplying the model through by Z+ [ Z ' E(Z) rather than X+, show that the instrumental-variable estimator BIV = SZY =SZX is a consistent estimator of β. Why are both conditions (1) and (2) necessary for the instrumental variable Z to do its job?
(c) Suggest a substantive application in which it is unreasonable to assume that X is independent of other, prior causes of Y but where there is a third variable Z that is both correlated with X and, arguably, independent of the error.