To illustrate Galton’s thesis of regression to the mean, the British statistician Karl Pearson plotted the heights of 10 randomly chosen sons versus those of their fathers. The resulting data (in inches) were as follows.
Father’s height
|
Son’s height
|
Father’s height
|
Son’s height
|
60
|
63.6
|
67
|
67.1
|
62
|
65.2
|
68
|
67.4
|
64
|
66
|
70
|
68.3
|
65
|
65.5
|
72
|
70.1
|
66
|
66.9
|
74
|
70
|
A scatter diagram representing these data is presented in Fig. 12.5. Note that whereas the data appear to indicate that taller fathers tend to have taller sons, they also appear to indicate that the sons of fathers who are either extremely short or extremely tall tend to be more “average” than their fathers; that is, there is a
regression toward the mean. We will determine whether the preceding data are strong enough to prove that there is a regression toward the mean by taking this statement as the alternative hypothesis. That is, we use the given data to test
H0: β ≥ 1 against H1: β <>
FIGURE 12.5
Scatter diagram of son’s height versus father’s height.
Now, this test is equivalent to a test of
H0: β = 1 against H1: β <>
and will be based on the fact that
has a
t
distribution with
n
− 2 degrees of freedom.
Hence, when β = 1, the test statistic
has a
t
distribution with 8 degrees of freedom. The significance-level-α test will be to reject H0
when the value of TS is sufficiently small (since this will occur when
the estimator of β, is sufficiently smaller than 1). Specifically, the test is to
Reject H0
|
if TS ≤−t8,a
|
Not reject H0
|
Otherwise
|
To determine the value of the test statistic TS, we run Program 12-1 and obtain the following:
The least-squares estimators are as follows
A = 35.97757
B = 0.4645573
The estimated regression line is
Y = 35.97757 + 0.4645573x
S(x,Y) = 79.71875
S(x,x) = 171.6016
S(Y,Y) = 38.53125
SS
R
= 1.497325
The square root of (n
− 2)S(x, x)/SS
R
is 30.27942
From the preceding we see that
TS = 30.2794(0.4646 − 1) = −16.21
Since
t
8,0.01
= 2.896, we see that
TS <>t
8, 0.01
and so the null hypothesis that β ≥ 1 is rejected at the 1 percent level of significance. In fact the
p
value is
p
value =
P{T
8
≤ −16.213} ≈ 0
and so the null hypothesis that β ≥ 1 is rejected at almost any significance level, thus establishing a regression toward the mean.