The R data set state.x77, collected by the US Bureau of the Census in the 1970s, has the population, per capita income, illiteracy, life expectancy, murder rate, percent high school graduation, mean number of frost days (defined as days with minimum temperature below freezing in the capital or a large city for years 1931–1960), and land area in square miles for each of the 50 states. It can be imported into the R data frame st by data(state); st=data.frame(state.x77, row.names=state.abb, check.names=T), or by st = read.table(”State.txt”, header=T). We will consider life expectancy to be the response variable and the other seven as predictor variables. Use R commands to complete the following.
(a) Use h1=lm(Life.Exp∼Population+Income+Illiteracy+ Murder+HS.Grad+Frost+Area, data=st); summary (h1) to fit an MLR model, with no polynomial or interaction terms, for predicting life expectancy on the basis of the seven predictors. Report the estimated regression model, the R2adj, and the p-value for the model utility test. Is the model useful for predicting life expectancy?
(b) Test the joint significance of the variables “Income,” “Illiteracy,” and “Area” at level 0.05. (Hint. The reduced model is most conveniently fitted through the function update. The R command for this is h2=update(h1, .∼. -Income-Illiteracy-Area). According to the syntax of update( ), a dot means “same.” So the above update command is read as follows: “Update h1 using the same response variables and the same predictor variables, except remove (minus) “Income,” “Illiteracy,” and “Area.”)
(c) Compare the R2
and R2adj values for the full and reduced models. Is the difference in R2
consistent with the p-value in part (b)? Explain. Why is the difference in R2adj bigger?
(d) Using standardized residuals from the reduced model, test the assumptions of normality and homoscedasticity, both graphically and with formal tests.
(e) Using the reduced model, give the fitted value for the state of California. (Note that California is listed fifth in the data set.) Next, give a prediction for the life expectancy in the state of California with the murder rate reduced to 5. Finally, give a 95% prediction interval for life expectancy with the murder rate reduced to 5.