This exercise is a continuation of the Laurie-Alberg experiment on relating the activity of fruit flies to four enzymes (Exercise 11.9). The results of the SVD on Z are given in Exercise 11.9. Some of the results from principal component regression are given in the accompanying tables. Estimates of the regression coefficients (for Zs) retaining the indicated principal components:
Variances of estimated regression coefficients retaining the indicated principal components:
(a) From the SVD in Exercise 11.9, are any principal components cause for concern in variance inflation? Which Zs are heavily involved in the fourth principal component?
(b) From inspection of the behavior of the variances as the principal components are dropped, which variables are heavily involved in the fourth principal component? Which are involved in the third principal component?
(c) Which principal component regression solution would you use? The variances continue to decrease as more principal components are dropped from the solution. Why would you not use the solution with only the first principal component?
(d) Do a t-test of the regression coefficients for your solution. (There were n = 21 observations in the data set.) State your conclusions.
Exercise 11.9
PROC REG (in SAS) was run on a set of data with n = 40 observations on Y and three independent variables. The collinearity diagnostics gave the following results.
(a) What is the rank of X in this model?
(b) What is the condition number for X? What does that say about the potential for collinearity problems?
(c) Interpret the variance proportions for the fourth principal component. Is there variance inflation from the collinearity? Which regression coefficients are being affected most?
(d) Compute the variance proportions for the third principal component after the fourth has been removed. Considering the condition index and the variance proportions for the third principal component, is there variance inflation from the third component?