Data analysts are often confronted with a set of few measured independent variables and are to choose the “best” predictive equation. Not infrequently, such an analysis consists of taking the measured variables, their pairwise cross-products, and their squares; throwing the whole lot into a computer; and using a variable selection method (usually stepwise regression) to select the best model. Using the data for the urinary calcium study in Problems 2.6, 3.8, 5.5, and 6.4 (data in Table D-5, Appendix D), do such an analysis using several variable selection methods. Do the methods agree? Is there a problem with multicollinearity, and, if so, to what extent can it explain problems with variable selection? Does centering help? Is this “throw a whole bunch of things in the computer and stir” approach useful?
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here