1. The data set
Xdata.RData
contains a data frame,
Xdata, that has n = 100 observations
(rows) and five variables named
X1, X2, X3, X4, and
Y. This is a simulated data set generated
from the following model:
The error term ε is a normal random variable with mean 0 and standard deviation σ. In the
simulation the following parameter values were used: β0
= 0, β1
= 1, β2
= 1, β3
= 1, β4
= 1, and
σ = 0.5.
How successfully can the iterative process described above identify the model that generated the
data? To answer this question, ask what you should expect to see for the Box-Cox parameter λ,
and transformations of the predictor variables. Do you get something similar from the data?
Can you identify the model coefficients reasonably well?
2. Use the
Boston Housing
R Data(BHD0.RData)to give a 95% prediction interval for the median home value of a census tract that has the following characteristics: NOX = .65, RM = 5.5, AGE = 80, and LSTAT = .16. Use a logarithm transformation with MEDV, and assume that the predictors are correctly treated without using transformations.
3. For the
prostate
R data, fit a model with
lpsa
as the response and the other variables as predictors. Answer the following questions:
(a) Check for outliers.
(b) Check for influential points.
(c) Check the structure of the relationship between the predictors and the response.
4. Use the
fat
R data, fitting the model described in Section 4.2.
> data(fat,package="faraway")
> lmod
(a) Compute the condition numbers and variance inflation factors. Comment on the degree of collinearity observed in the data.
(b) Cases 39 and 42 are unusual. Refit the model without these two cases and recompute the collinearity diagnostics. Comment on the differences observed from the full data fit.
(c) Fit a model with
brozek
as the response and just
age,
weight
and
height
as predictors. Compute the collinearity diagnostics and compare to the full data fit.
(d) Compute a 95% prediction interval for
brozek
for the median values of
age,
weight
and
height.
(e) Compute a 95% prediction interval for
brozek
for
age=40,
weight=200 and
height=73. How does the interval compare to the previous prediction?
(f) Compute a 95% prediction interval for
brozek
for
age=40,
weight=130 and
height=73. Are the values of predictors unusual? Comment on how the interval compares to the previous two answers.
5. Ankylosing spondylitis is a chronic form of arthritis. A study was conducted to determine whether daily stretching of the hip tissues would improve mobility. The R data are found in
hips. The flexion angle of the hip before the study is a predictor and the flexion angle after the study is the response.
(a) Plot the data using different plotting symbols for the treatment and the control status.
(b) Fit a model to determine whether there is a treatment effect.
(c) Compute the difference between the flexion before and after and test whether this difference varies between treatment and control. Contrast this approach to your previous model.