quote
FINC 620 Math and Stat for Data Science Final Exam Due: 12/17/17 Please show your work. If only the answer is provided no credit will be given. All data files referenced below are on blackboard under the datasets section. You might have to reformat the dataset in some cases to suit R. Use R where appropriate. You can turn in a word doc generated through knitr or you can copy paste the R code and output, where required, to a word file, along with the other answers that may not require R and turn it in. Turn in the document via the dropbox on blackboard. 1. In 2005, United Airlines announced that it would award 500 frequent flier miles to every traveler on flights that arrived more than 30 minutes late on all flights departing from Chicago O'Hare to seven other hub airports (see The Wall Street Journal, June 14, 2005). On a randomly chosen day (Tuesday, April 26, 2005), the Bureau of Transportation Statistics website (www.bts.gov) showed 278 United Airlines departures from O'Hare. The data is provided in the file UniteAir.csv. Use R to compute the mean delay, the quartiles for the delay, the range and the standard deviation of the delay. Use R to construct a box plot. How many outliers are there? What is the probability of a traveler receiving a frequent flyer bonus? (5 points) 2. Baxter Inc. is a medium-sized maker of lawn furniture. Their business is doing well and they are considering building a new manufacturing plant to handle potential increased sales next year. Will next year's demand for lawn furniture be up (U) or down (D)? Baxter's own financial staff estimates a 60 percent chance that demand for lawn furniture will be up next year (P(U) = 0.6). These are their prior probabilities. But Baxter's financial staff is not experienced in macroeconomic forecasting. Baxter is considering whether to engage an expert market analyst who has wide experience in forecasting demand for consumer goods. In terms of predicting directional (+ or −) changes in demand, she has been correct 80 percent of the time. If the expert is hired, Baxter could revise its estimate, using two conditional probabilities: P(Expert says “up”|U) = 0.8 P(Expert says “down”|D) = 0.8 What Baxter really wants to know is the chance the demand will go up if the expert says the demand will go up (P(U|Expert says “up”). Compute this probability for Baxter. (5 points) 3. Let S be the event that a randomly chosen female aged 18–24 is a smoker. Let C be the event that a randomly chosen female aged 18–24 is a Caucasian. Given P(S) = .246, P(C) = .830, and P(S ⋂ C) = .232, find each probability. (5 points) a. P(S′). b. P(S ⋃ C). c. P(S | C). d. P(S | C′). 4. Car security alarms go off at a mean rate of 3.8 per hour in a large Costco parking lot. Find the probability that in an hour there will be (a) no alarms; (b) fewer than four alarms. (2 points) 5. Tire pressure in a certain car is a normally distributed random variable with mean 30 psi (pounds per square inch) and standard deviation 2 psi. The manufacturer's recommended correct inflation range is 28 psi to 32 psi. A motorist's tire is inspected at random. (a) What is the probability that the tire's inflation is within the recommended range? (b) What is the probability that the tire is underinflated? (c) A company has developed a microchip that will warn when a tire is 25 percent below the recommended mean, to warn of dangerously low tire pressure. How often would such an alarm be triggered? (5 points) 6. In a certain microwave oven on the high power setting, the time it takes a randomly chosen kernel of popcorn to pop is normally distributed with a mean of 140 seconds and a standard deviation of 25 seconds. What percentage of the kernels will fail to pop if the popcorn is cooked for (a) 2 minutes? (c) If you wanted 95 percent of the kernels to pop, what time would you allow? (3 points) 7. Dave the jogger runs the same route every day (about 2.2 miles). On 18 consecutive days, he recorded the number of steps using a pedometer. The results were 3,450 3,363 3,228 3,360 3,304 3,407 3,324 3,365 3,290 3,289 3,346 3,252 3,237 3,210 3,140 3,220 3,103 3,129 (a) Construct a 95 percent confidence interval for the true mean number of steps Dave takes on his run. (b) What sample size would be needed to obtain an error of ± 20 steps with 95 percent confidence? (5 points) 8. Beer shelf life is a problem for brewers and distributors because when beer is stored at room temperature, its flavor deteriorates. When the average furfuryl ether content reaches 6 μg per liter, a typical consumer begins to taste an unpleasant chemical flavor. (a) At α = .05, would the following sample of 12 randomly chosen bottles stored for a month convince you that the mean furfuryl ether content exceeds the taste threshhold? (b) What is the p-value? (5 points) 9. Do male and female school superintendents earn the same pay? Salaries for 20 males and 17 females in a certain metropolitan area are shown in the file PayCheck.csv. At α = .01, were the mean superintendent salaries greater for men than for women? (a) State the hypotheses. (b) State the decision rule. (c) Find the test statistic. (d) Make a decision. (e) Estimate the p-value and interpret it. (5 points) 10. A certain company will purchase the house of any employee who is transferred out of state and will handle all details of reselling the house. The purchase price is based on two assessments, one assessor being chosen by the employee and one by the company. Based on the sample of eight assessments shown in file HomeValue.csv, do the two assessors agree? Use the .01 level of significance, state hypotheses clearly, and show all steps. (5 points) 11. Mean output of solar cells of three types are measured six times under random light intensity over a period of 5 minutes, yielding the results shown in file SolarWatts.csv. Is the mean solar cell output the same for all cell types? Run a Tukey test to check which means differ. (5 points) 12. The Environmental Protection Agency (EPA) advocates a maximum arsenic level in water of 10 micrograms per liter. The results of EPA tests on randomly chosen wells in a suburban Michigan county are shown in file Arsenic.xls. Is the mean arsenic level affected by well depth and/or age of well? (5 points) 13. JetBlue’s revenue data is shown in file JetBlue.csv. (a) Use R to make a line chart for JetBlue's revenue. (b) Fit both a linear and an exponential trend to the data. (c) Which model is preferred? Why? (e) Make annual forecasts for 2011–2013, using the linear trend model. (5 points) 14. Data on gas bills is shown in the file GasBills.csv. Use R to decompose this data into seasonal, trend, and random components. Deseasonalize the data and fit a linear trend to it. (5 points) 15. A student team examined parked cars in four different suburban shopping malls. One hundred vehicles were examined in each location. The data is shown in Vehicles.xls. At α = .05, does vehicle type vary by mall location? (5 points) 16. The grade point averages for 25 randomly chosen university business students during a recent semester are shown in GPA.csv. At α = .01, are the median grade point averages the same for students in these four class levels? Use the appropriate nonparametric test in R. (5 points)