1. There is a formula for sample size n with given margin of error m and confidence level C for population proportion. n = 1 4 ( z? m) 2 What assumption is made for 1/4 in the formula above? Please give a 1-2 sentences brief explanation to your choice. (a). Random guess (b). Assume sample proportion in the future is 1/2 (c). m is half the length of confidence interval 2. Which one is NOT a linear regression models? Please give a 1-2 sentences brief explanation to your choice. (a). yi = 0 + exp (1xi) + ?i, i = 1, 2, ··· , n (b). yi = 0 + 1xi + 2x2 i + ?i, i = 1, 2, ··· , n (c). yi = 0 exp (xi) + 2x7 i + ?i, i = 1, 2, ··· , n 3. Suppose X and Y has linear correlation coecient r = 0.5, and there are 77 observations, what is the test statistic for the hypothesis test H0 : 1 = 0 vs. Ha : 1 6= 0 where 1 comes from the simple linear regression model below? Please give a 1-2 sentences brief explanation to your choice. Y = 0 + 1X + ? (a). Not enough information (b). 5 (c). 0.25 2 0.5 1.0 1.5 2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 yhat ehat 4. What can you see from the graph above? Please give a 1-2 sentences brief explanation to your choice. (a). None-constant variance (b). Dependent error terms (c). None-linearity 5. Which model is more possible to have smaller R2? Please give a 1-2 sentences brief explanation to your choice. A : Y = 0 + 1X1 + ? B : Y = ? 0 + ? 1X1 + ? 2X2 + ? ? where Y and X1 in model A and B are the same. (a). Not enough information (b). Model A (c). Model B 3 6. Suppose we have designed an experiment to compare the 5 di?erent treatment on the weight gain of rat’s liver, and µi is the population mean of the weight gain of rats under treatment i. Which one below is not a contrast? Please give a 1-2 sentences brief explanation to your choice. (a). µ2 = µ4 (b). 2µ3 µ4 = µ1 2µ2 (c). µ3 = (µ2 + 2µ4)/3 Problem 2 (18 points) Suppose we have a partial R output below. Please answer the questions and show your steps. Estimate Std. Error t value Pr(>|t|) (Intercept) --- 0.1780 0.850 0.402 x -0.2552 --- -1.418 0.166 Residual standard error: 1.006 on 30 degrees of freedom Multiple R-squared: 0.06284, Adjusted R-squared: 0.0316 (a). [3 points] What is the SSE of the model? (b). [3 points] What is the linear correlation coecient between x and y? (c). [3 points] Suppose we know ¯y is 1.44, what is ¯x? (d). [3 points] Find the P-value for H0 : 1 = 0.5 vs. Ha : 1 < 0.5.="" (e).="" [3="" points][bonus]="" what="" is="" the="" 95%="" ci="" for="" the="" ˆy="" at="" x="3?" (f).="" [3="" points]="" basing="" on="" the="" output,="" can="" you="" give="" a="" reasonable="" guess="" of="" the="" probability="" that="" the="" y="" at="" x="2" is="" larger="" than="" 0.2.="" 4="" problem="" 3="" (8="" points)="" for="" example,="" if="" y="" is="" annual="" income="" ($1000/year),="" x1="" is="" educational="" level="" (number="" of="" years="" of="" schooling),="" x2="" is="" number="" of="" years="" of="" work="" experience,="" and="" x3="" is="" gender="" (x3="0" is="" male,="" x3="1" is="" female),="" then="" after="" a="" linear="" regression="" of="" the="" data="" collected="" and="" assume="" the="" estimated="" parameters="" are="" the="" true="" values,="" then="" we="" have="" y="15" +="" 0.8x1="" +="" 0.5x2="" 3x3="" +="" where="" n(0,="" 32)="" (a).="" [2="" points]="" what="" is="" the="" average="" di?erence="" of="" annual="" income="" between="" women="" and="" men="" if="" their="" other="" conditions="" are="" the="" same?="" (b).="" [2="" points]="" what="" is="" the="" average="" annual="" income="" of="" a="" female="" with="" 10="" years="" of="" eduction="" and="" 10="" years="" of="" working="" experience?="" what="" about="" a="" male="" of="" the="" same="" condition?="" (c).="" [2="" points]="" what="" is="" the="" probability="" that="" a="" female="" with="" 16="" years="" education="" and="" no="" work="" experience="" will="" earn="" more="" than="" $30,000/year?="" (d).="" [2="" points]="" suppose="" a="" female="" has="" 15="" years="" of="" education,="" and="" she="" has="" 4="" years="" of="" working="" experience,="" how="" many="" more="" years="" of="" working="" experience="" will="" make="" her="" expected="" annual="" income="" is="" no="" less="" than="" $28,000?="" problem="" 4="" (12="" points)="" a="" fisheries="" biologist="" is="" interested="" in="" determining="" a="" set="" of="" optimal="" conditions="" for="" growing="" hatchery="" trout.="" the="" two="" factors="" that="" are="" most="" easily="" controlled="" at="" the="" hatchery="" are="" water="" temperature="" (a)="" and="" fungicide="" (b).="" the="" biologist="" designs="" an="" experiment="" consisting="" of="" 2="" di?erent="" water="" temperatures,="" 3="" di?erent="" levels="" of="" fungicide,="" and="" 5="" observations="" on="" each="" of="" the="" temperature-fungicide="" combinations.="" the="" following="" summary="" data="" resulted="" on="" the="" response="" variable="" y="weight" of="" a="" hatchery="" trout.="" source="" of="" variation="" df="" sum="" squares="" mean="" square="" f-value="" water="" temperature="" (="" )="" (="" )="" 72="" (="" )="" fungicide="" (="" )="" 18="" (="" )="" (="" )="" interaction="" (="" )="" (="" )="" 5="" (="" )="" error="" (="" )="" 100="" (="" )="" 5="" (a).="" [4="" points]="" filling="" in="" the="" missing="" entries="" above.="" (b).="" [5="" points]="" test="" for="" the="" null="" hypothesis="" of="" no="" interaction="" between="" water="" temperature="" and="" level="" of="" fungicide.="" if="" appropriate,="" perform="" tests="" of="" the="" main="" e?ects="" for="" the="" two="" factors:="" water="" temperature="" and="" fungicide="" as="" well.="" use="" significance="" level="" =="" 5%.="" (c).="" [3="" points]="" what="" conclusion="" you="" will="" make="" basing="" on="" the="" analysis="" on="" part="" (b).="" problem="" 5="" (12="" points)="" rats="" were="" given="" one="" of="" four="" di?erent="" diets="" at="" random,="" and="" the="" response="" measure="" was="" liver="" weight="" as="" a="" percentage="" of="" body="" weight.="" there="" are="" 5="" observations="" for="" treatment="" 1,="" 6="" for="" treatment="" 2,="" 6="" for="" treatment="" 3="" and="" 8="" for="" treatment="" 4.="" below="" are="" the="" some="" r="" output="" of="" the="" analysis:=""> m<-lm(liver~as.factor(treatment),data=rat)> anova(m) Df Sum Sq Mean Sq F value Pr(>F) as.factor(Treatment) 3 (4) 0.192736 (2) -- Residuals (1) (3) -- > summary(m) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.75 0.07688 48.720 1010µ4. 6 Problem 6 (10 points) Using the data in ‘P6 data.txt’ with a single response Y and three predictors X1, X2 and X3. (a). [4 points] Perform the backward and forward variable selection procedure using AIC as the criterion for this data set. Please report your final models first, then show your R commands and related outputs. (b). [3 points] In this specific case, which variable selection procedure (backward or forward) is better? Why? (c). [3 points] Find the AIC, BIC and Cp for the model lm(Y ? X1 + X3). Problem 7 (42 points) Pine oleoresin is obtained by tapping the trunks of pine trees. Tapping is done by cutting a hole in the bark and collecting the resin that oozes out. This experiment compares four shapes for the holes and the ecacy of acid treating the holes. Twentyfour pine trees are selected at random from a plantation, and the 24 trees are assigned at random to the eight combinations of whole shape (circular, diagonal slash, check, rectangular) and acid treatment (yes or no). The response (y) is total grams of resin collected from the hole. Data is available in ‘resin.csv’. (a). [2 points] Summarize the data graphically (interaction plots and profile plots) and tell what you can see from the graphs. (b). [2 points] Build up a two-way ANOVA analysis containing all the main e?ects and the interaction e?ects. Are all the e?ects are significant? (c). [7 points][Bonus] If some e?ects are not significant, what does that mean? Please show how to understand the p-value in the first row of your ANOVA table. I know you can find the general answer for it somewhere, and it is not good enough since I need you to specify your understanding on this specific case (the specific test, specific sample size, specific population and so on). (d). [3 points] Please check with all the assumptions for this two-way ANOVA analysis(model in part (b)) graphically and give specific comments to each of the assumptions. 7 (e). [3 points] Do you think a transformation of the data is necessary? If yes, how? If no, why? (f). [3 points] Basing on the model suggested from part (e), please simplify the model by dropping the insignificant terms (you need to check with the assumptions whenever you have a new/di?erent model). (g). [2 points] Please draw conclusions of your two-way ANOVA analysis. (h). [20 points] Now, you have all the analysis of this problem. Please compile a brief statistic report to show the whole idea. You can follow the format of the example report. The report should be no more than 3 pages (including all the essential graphs). Hint: the total 20 points contain three parts: 1. conclusion(5 points); 2. way of analysis(8 points); 3. format(7 points).-lm(liver~as.factor(treatment),data=rat)>