Re-read the Learning Activity titled “Estimation and Prediction.” In your own words, explain why the confidence intervals found in Examples 1 and 2 are not describing the same thing. What condition in the problem determines whether you are finding an estimate or making a prediction? If you were presented a scatterplot graph of the same data set with the regression equation included, where would you locate the exact answer to the average value of a four-year old car? Where would you find the estimate for the average value of a four-year old car?
As you read this section, consider the difference between an estimate and a prediction, as well as the difference between a confidence interval and a prediction interval. Provide a real world example of each type of interval in Social Learning; then read the examples posted by your peers. Estimation and Prediction Consider the following pairs of problems that look at automobile age and value. Table 6.4 Data on Age and Value of Used Automobiles of a Specific Make and Model x 2 3 3 3 4 4 5 5 5 6 y 28.7 24.8 26.0 30.5 23.8 24.6 23.8 20.4 21.6 22.1 Problem 1 1. Estimate the average value of all 4-year-old automobiles of this make and model. 2. Construct a 95% confidence interval for the average value of all 4-year-old automobiles of this make and model. Problem 2 1. Shylock intends to buy a 4-year-old automobile of this make and model next week. Predict the value of the first such automobile that he encounters. 2. Construct a 95% confidence interval for the value of the first such automobile that he encounters. The method of solution and answer to the first question in each pair, 1a. and 2a., are the same. When we set x equal to 4 in the least squares regression equation , the number returned, which corresponds to value $24,630, is an estimate of precisely the number sought in question 1a.: the mean E(y) of all y values when x = 4. Because nothing is known about the first 4-year-old automobile of this make and model that Shylock will encounter, our best guess as to its value is the mean value E(y) of all such automobiles, the number 24.63 or $24,630, computed in the same way. The answers to the second part of each question differ. In question 1b., we are trying to estimate a population parameter: the mean of the all the y-values in the subpopulation picked out by the value x = 4, that is, the average value of all 4-year-old automobiles. In question 2b., however, we are not trying to capture a fixed parameter, but the value of the random variable y in one trial of an experiment: examine the first 4-year-old car Shylock encounters. In the first case we seek to construct a confidence interval in the same sense that we have done before. In the second case, the situation is different, and the interval constructed has a different name, prediction interval. In the second case, we are trying to “predict” where the value of a random variable will take its value. Confidence Interval for the Mean Value of y at where · is a particular value of that lies in the range of -values in the sample data set used to construct the least squares regression line. · is the numerical value obtained when the least squares regression equation is evaluated at · The number of degrees of freedom for is = n-2. The formula for the prediction interval is identical except for the presence of the number 1 underneath the square root sign. This means that the prediction interval is always wider than the confidence interval at the same confidence level and value of x. In practice, the presence of the number 1 tends to make it much wider. Prediction Interval for an Individual New Value of y at where · is a particular value of that lies in the range of-values in the data set used to construct the least squares regression line. · is the numerical value obtained when the least square regression equation is evaluated at · The number of degrees of freedom for is = - 2. Example 1 x 2 3 3 3 4 4 5 5 5 6 y 28.7 24.8 26.0 30.5 23.8 24.6 23.8 20.4 21.6 22.1 Using the data values in the table above for the "Age and Value of Used Automobiles of a Specific Make and Model," construct a 95% confidence interval for the average value of all 3.5-year-old automobiles of this make and model. Solution Solving this problem is merely a matter of finding the values of and inserting them into the confidence interval formula given just above. Most of these quantities are already known. From the statement of the problem , the value of x of interest. The value of is the number given by the regression equation, which is when , that is, when x = 3.5. Thus, here Lastly, confidence level 95% means that Because the sample size is n = 10, there are degrees of freedom. Referencing the probability distribution table gives ; Thus, which gives the interval (24.149, 27.161). We are 95% confident that the average value of all 3.5-year-old vehicles of this make and model is between $24,149 and $27,161. Example 2 Table 6.5 Data on Age and Value of Used Automobiles of a Specific Make and Model x 2 3 3 3 4 4 5 5 5 6 y 28.7 24.8 26.0 30.5 23.8 24.6 23.8 20.4 21.6 22.1 Using the sample data to construct a 95% prediction interval for the predicted value of a randomly selected 3.5-year-old automobile of this make and model. Solution The computations for this example are identical to those of the previous example, except that now there is the extra number 1 beneath the square root sign. Because we were careful to record the intermediate results of that computation, we have immediately that the 95% prediction interval is which gives the interval (21.017, 30.293). We are 95% confident that the value of a randomly selected 3.5-year-old vehicle of this make and model is between $21,017 and $30,293. Note what an enormous difference the presence of the extra number 1 under the square root sign made. The prediction interval is about 2.5 times wider than the confidence interval at the same level of confidence. Note. Adapted from “Estimation and Prediction,” by Shafer, Zhang, 2012, Introductory Statistics, Chapter 10, Section 7. Copyright 2012 Flat World Knowledge, Inc.