The table presents four sets of data prepared by the statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data.
Four Data Sets For Exploring Correlation and Regression
Data Set A |
---|
xx |
1010 |
88 |
1313 |
99 |
1111 |
1414 |
66 |
44 |
1212 |
77 |
55 |
---|
yy |
8.048.04 |
6.956.95 |
7.587.58 |
8.818.81 |
8.338.33 |
9.969.96 |
7.247.24 |
4.264.26 |
10.8410.84 |
4.824.82 |
5.685.68 |
---|
Data Set B |
---|
xx |
1010 |
88 |
1313 |
99 |
1111 |
1414 |
66 |
44 |
1212 |
77 |
55 |
---|
yy |
9.149.14 |
8.148.14 |
8.748.74 |
8.778.77 |
9.269.26 |
8.108.10 |
6.136.13 |
3.103.10 |
9.139.13 |
7.267.26 |
4.744.74 |
---|
Data Set C |
---|
xx |
1010 |
88 |
1313 |
99 |
1111 |
1414 |
66 |
44 |
1212 |
77 |
55 |
---|
yy |
7.467.46 |
6.776.77 |
12.7412.74 |
7.117.11 |
7.817.81 |
8.848.84 |
6.086.08 |
5.395.39 |
8.158.15 |
6.426.42 |
5.735.73 |
---|
Data Set D |
---|
xx |
88 |
88 |
88 |
88 |
88 |
88 |
88 |
88 |
88 |
88 |
1919 |
---|
yy |
6.586.58 |
5.765.76 |
7.717.71 |
8.848.84 |
8.478.47 |
7.047.04 |
5.255.25 |
5.565.56 |
7.917.91 |
6.896.89 |
12.5012.50 |
---|
To access the complete data sets, click the link for your preferred software format:
Data Set AA
Excel Minitab JMP SPSS TI R Mac-TXT PC-TXT CSV CrunchIt!
Data Set BB
Excel Minitab JMP SPSS TI R Mac-TXT PC-TXT CSV CrunchIt!
Data Set CC
Excel Minitab JMP SPSS TI R Mac-TXT PC-TXT CSV CrunchIt!
Data Set DD
Excel Minitab JMP SPSS TI R Mac-TXT PC-TXT CSV CrunchIt!
(a) Without making scatterplots, find the correlation, to three decimal places, for all four data sets. Fill in the blanks for the value of the correlation for Set A,A, for the value of the correlation for Set B,B, for the value of the correlation for Set C,C, and for the value of the correlation for Set D,D, respectively. (Enter your answers rounded to three decimal places.)
Without making scatterplots, find the least‑squares regression line for all four data sets. What do you notice about the least-squares regression lines? Choose the correct answer.
The least‑squares regression lines are approximately the same for all four data sets.
The least‑squares regression lines for Data Sets BB and CC are the same, but different from Data Sets AA and D.D.
The least‑squares regression line for each data set is markedly different.
The least‑squares regression lines for Data Sets AA and CC are the same, but different from Data Sets BB and D.D.
Use the regression line to predict ^yy^ for x=10.x=10. (Enter your answer rounded to a whole number.)
(b) Make a scatterplot for each of the data sets, and add the regression line to each plot. Select the correct set of a scatterplots with regression lines for Set A,A, Set B,B, Set C,C, and Set D,D, respectively.
(c) In which of the four cases would you be willing to use the regression line to describe the dependence of yy on x?x?