This is a continuation of using the number of hospital days for smokers from Exercise 1.21. The dependent variable is Y = ln(number of hospital days for smokers). The independent variables are X
1= (number of cigarettes)2 and X
2= ln(number of hospital days for nonsmokers). Note that X
1is the square of number of cigarettes.
(a) Plot Y against number of cigarettes and against the square of number of cigarettes. Do the plots provide any indication of why the square of number of cigarettes was chosen as the independent variable?
(b) Complete the analysis of variance for the regression of Y on X1
and X2. Does the information on number of hospital days for nonsmokers help explain the variation in number of hospital days for smokers? Make an appropriate test of significance to support your statement. Is Y, after adjustment for number of hospital days for nonsmokers, related to X1? Make a test of significance to support your statement. Are you willing to conclude from these data that number of cigarettes smoked has a direct effect on the average number of hospital days?
(c) It is logical in this problem to expect the number of hospital days for smokers to approach that of nonsmokers as the number of cigarettes smoked goes to zero. This implies that the intercept in this model might be expected to be zero. One might also expect β2
to be equal to one. (Explain why.) Set up the general linear hypothesis for testing the composite null hypothesis that β0
= 0 and β2
= 1.0. Complete the test of significance and state your conclusions.
(d) Construct the reduced model implied by the composite null hypothesis under
(c). Compute the regression for this reduced model, obtain the residual sum of squares, and use the difference in residual sums of squares for the full and reduced models to test the composite null hypothesis. Do you obtain the same result as in (c)?
Exercise 1.21
Hospital records were examined to assess the link between smoking and duration of illness. The data reported in the table are the number of hospital days (per 1,000 person-years) for several classes of individuals, the average number of cigarettes smoked per day, and the number of hospital days for control groups of nonsmokers for each class. (The control groups consist of individuals matched as nearly as possible to the smokers for several primary health factors other than smoking.)
(a) Plot the logarithm of number of hospital days (for the smokers) against number of cigarettes. Do you think a linear regression will adequately represent the relationship?
(b) Plot the logarithm of number of hospital days for smokers minus the logarithm of number of hospital days for the control group against number of cigarettes. Do you think a linear regression will adequately represent the relationship? Has subtraction of the control group means reduced the dispersion?
(c) Define Y = ln(# days for smokers)−ln(# days for nonsmokers) and X = (#cigarettes)2. Fit the linear regression of Y on X. Make a test of significance to determine if the intercept can be set to zero. Depending on your results, give the regression equation, the standard errors of the estimates, and the summary analysis of variance.