Use the data from Andrews and Herzberg (1985) on percentages of sand, silt, and clay in soil at 20 sites given in Exercise 11.11.
(a) Do the singular value decomposition on Z, the centered and scaled variables, and construct Gabriel’s biplot of the data.
(b) How many principal components must be used in order to account for 80% of the dispersion?
(c) Interpret the results of the biplot (of the first and second principal components) in terms of
(i) which variable vectors are not well represented by the biplot,
(ii) the correlational structure of the variables,
(iii) how the 20 sites tend to cluster, and (iv) which site has very low sand content at depths 1 and 2 but moderately high sand content at depth 3.
Exercise 11.11
The following are the results of a principal component analysis, on Z, of data collected from a fruit fly experiment attempting to relate a measure of fly activity, WFB = wing beat frequency, to the chemical activity of four enzymes, SDH, F UM, GH, and GO. Measurements were made on n = 21 strains of fruit fly. (Data courtesy of Dr. Laurie Alberg, North Carolina State University.)
(a) Compute the proportion of the dispersion in the X-space accounted for by each principal component. (b) Compute the condition number for Z and the condition index for each principal component. What do the results suggest about possible variance inflation from collinearity?
(c) Describe the first principal component in terms of the original centered and standardized variables. Describe the second principal component.
(d) The sum of the variances of the estimates of the least squares regression coefficients, tr[Var(β)] = Σ(1/λj)σ2, must be larger than σ2/λ4. Compute this minimum (in terms of σ2). How does this compare to the minimum if the four variables had been orthogonal?