Consider the data set in Table 12.15, in which a maker of asphalt shingles is interested in the relationship between sales for a particular year and factors that influence sales. (The data were taken from Kutner et al., 2004, in the Bibliography.)
Of the possible subset models, three are of particular interest. These three arex
2
x
3,x
1
x
2
x
3, andx
1
x
2
x
3
x
4. The following represents pertinent information for comparing the three models. We include the PRESS statistics for the three models to supplement the decision making.
Model
|
R2
|
R2 pred
|
s2
|
PRESS
|
Cp
|
x2x3
|
0.9940
|
0.9913
|
44.5552
|
782.1896
|
11.4013
|
x1x2x3
|
0.9970
|
0.9928
|
24.7956
|
643.3578
|
3.4075
|
x1x2x3x4
|
0.9971
|
0.9917
|
26.2073
|
741.7557
|
5.0
|
It seems clear from the information in the table that the modelx
1
, x
2
, x
3 is preferable to the other two. Notice that, for the full model,Cp= 5.0. This occurs since thebias portionis zero, and2 = 26.2073 is the mean square error from the full model.
Figure 12.6 is aSASPROC REG printout showing information for all possible regressions. Here we are able to show comparisons of other models with (x
1
, x
2
, x
3). Note that (x
1
, x
2
, x
3) appears to be quite good when compared to all models. As a final check on the model (x
1
, x
2
, x
3), Figure 12.7 shows a normal probability plot of the residuals for this model.
Table 12.15:
District
|
Promotional Accounts, x1
|
Active Accounts, x2
|
Competing Brands, x3
|
Potential, x4
|
Sales, y (thousands)
|
1
|
5.5
|
31
|
10
|
8
|
79.3
|
2
|
2.5
|
55
|
8
|
6
|
200.1
|
3
|
8.0
|
67
|
12
|
9
|
163.2
|
4
|
3.0
|
50
|
7
|
16
|
200.1
|
5
|
3.0
|
38
|
8
|
15
|
146.0
|
6
|
2.9
|
71
|
12
|
17
|
177.7
|
7
|
8.0
|
30
|
12
|
8
|
30.9
|
8
|
9.0
|
56
|
5
|
10
|
291.9
|
9
|
4.0
|
42
|
8
|
4
|
160.0
|
10
|
6.5
|
73
|
5
|
16
|
339.4
|
11
|
5.5
|
60
|
11
|
7
|
159.6
|
12
|
5.0
|
44
|
12
|
12
|
86.3
|
13
|
6.0
|
50
|
6
|
6
|
237.5
|
14
|
5.0
|
39
|
10
|
4
|
107.2
|
15
|
3.5
|
55
|
10
|
4
|
155.0
|
Figure 12.6:SASprintout of all possible subsets on sales data
Dependent Variable: sales
|
Number in
Model
|
|
|
Adjusted
|
|
|
C(p)
|
R-Square
|
R-Square
|
MSE
|
Variables in Model
|
3
|
3.4075
|
0.9970
|
0.9961
|
24.79560
|
x1
|
x2
|
x3
|
|
4
|
5.0000
|
0.9971
|
0.9959
|
26.20728
|
x1
|
x2
|
x3
|
x4
|
2
|
11.4013
|
0.9940
|
0.9930
|
44.55518
|
x2
|
x3
|
|
|
3
|
13.3770
|
0.9940
|
0.9924
|
48.54787
|
x2
|
x3
|
x4
|
|
3
|
1053.643
|
0.6896
|
0.6049
|
2526.96144
|
x1
|
x3
|
x4
|
|
2
|
1082.670
|
0.6805
|
0.6273
|
2384.14286
|
x3
|
x4
|
|
|
2
|
1215.316
|
0.6417
|
0.5820
|
2673.83349
|
x1
|
x3
|
|
|
1
|
1228.460
|
0.6373
|
0.6094
|
2498.68333
|
x3
|
|
|
|
3
|
1653.770
|
0.5140
|
0.3814
|
3956.75275
|
x1
|
x2
|
x4
|
|
2
|
1668.699
|
0.5090
|
0.4272
|
3663.99357
|
x1
|
x2
|
|
|
2
|
1685.024
|
0.5042
|
0.4216
|
3699.64814
|
x2
|
x4
|
|
|
1
|
1693.971
|
0.5010
|
0.4626
|
3437.12846
|
x2
|
|
|
|
2
|
3014.641
|
0.1151
|
-.0324
|
6603.45109
|
x1
|
x4
|
|
|
1
|
3088.650
|
0.0928
|
0.0231
|
6248.72283
|
x4
|
|
|
|
1
|
3364.884
|
0.0120
|
-.0640
|
6805.59568
|
x1
|
|
|
|
Figure 12.7: Normal probability plot of residuals using the modelx
1
x
2
x
3
[[Exercises]]