please find attachement
STA2300 Data Analysis S2, 2019 1 Assignment 3 Due Date: 15 October, 2019 Weighting: 25% Full Marks: 100 (final marks to be converted to 25%) • Answering the questions in this assignment should not be your first attempt at these types of questions. It is essential that you work through practice exercises from the tutorial sheets (in the Study Book) and Text Book first. • This assignment is important in checking your knowledge, providing feedback and helping to establish competency in essential skills. • Answer all the questions. The questions are not of equal weight; some questions are worth much more than others. • The questions relate to material up to and including Module 10. • Before starting this assignment read Notes Concerning Assignments under the Introductory Material link on the StudyDesk. • When you are asked to comment on a finding, usually a short paragraph is required. • Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many cases the SPSS output contains much more information than is required for a correct and complete answer. In those cases just reproducing the output may not attract any marks. Make sure you report only the information from the SPSS output relevant to your answer. • Unless instructed otherwise, show all working and formulae used in calculating confidence intervals and performing hypothesis tests. (Answers may of course be checked where possible using SPSS). • In order to obtain full marks for any question you must show all working. • This assignment consists of 5 questions. • You will need to download data set AIS.sav from the StudyDesk of the course. Detailed information on the variables in the data set is found in AIS.txt file accessible from the StudyDesk. STA2300 Data Analysis S2, 2019 2 Question 1 (25 marks) Use the information in the dataset AIS.sav to answer the following questions. The data set AIS.sav contains information that was collected as part of a comprehensive study on 102 male and 100 female athletes randomly selected from the Australian Institute of Sport. Consider the data set as a random sample of all athletes of the Australian Institute of Sport (AIS). You should use SPSS to calculate the sample statistics you will need to do this question, but for the confidence interval in part (a) and test statistic in part (d) you are required to do the rest of the calculations by hand, using a calculator. (a) (8 marks) Find a 99% confidence interval for the population mean BMI of all the athletes of AIS by hand (show all working). [You may use sample summary statistics from SPSS outputs without pasting the outputs in your answer.] (b) (5 marks) Check the appropriate conditions and assumptions needed for the validity of the confidence interval or hypothesis test for the population mean BMI of all the athletes of AIS (include an appropriate graph to support your answer). (c) (3 marks) A doctor suspects that the average BMI of the population of AIS athletes is more than 22 kg/cm2. State appropriate hypotheses (define any symbols used) to perform a hypothesis test to see if there is evidence to support the suspicion, based on the available BMI data (regardless of whether the conditions in part (b) are satisfied or not). (d) (2 marks) Calculate the value of the appropriate test statistic for testing the hypotheses in part (c). (e) (4 marks) Based on the test statistic calculated in part (d) and using the appropriate statistical table provided in the StudyDesk, find the P-value of the test, and write a meaningful conclusion at the 1% level of significance. (f) (3 marks) Now, check your answers for parts (d) and (e) by finding the value of the test statistic and the P-value using SPSS. Include SPSS output in your answer and comment on the comparison with the hand calculated values. Explain any differences. Question 2 (25 marks) Considering the data set AIS.sav is a random sample of all athletes of the Australian Institute of Sport (AIS) population answer the following questions. You should use SPSS to calculate any sample summary statistics you will need to do this question, but for parts (d)-(g) you are required to do the rest of the calculations by hand, using a calculator and statistical tables. From previous studies it was known that the proportion of women athletes with medium level of Plasma Ferritin concentration (50ng/mL - 120ng/mL) was 0.35. A doctor claims that the proportion of women athletes with medium level of Plasma Ferritin concentration has increased in recent times. STA2300 Data Analysis S2, 2019 3 (a) (1 mark) What is the variable of interest here? (b) (3 marks) State the appropriate hypotheses (define any symbols used) to test the doctor’s claim that the proportion of women athletes with medium level of Plasma Ferritin concentration has increased in recent times. (c) (4 marks) Check the conditions and assumptions for the validity of the hypotheses to be tested in part (b). (d) (4 marks) Calculate the value of the appropriate test statistic for testing the hypotheses in part (b). [You only need to work with the women athletes with medium level of plasma ferritin here.] (e) (4 marks) Using the appropriate statistical table provided in the StudyDesk, find the P-value for the test, and write a meaningful conclusion at the 5% level of significance in the context of this study. (f) (5 marks) If the doctor wants to be 95% confident that the margin of error of the estimate of the true proportion of women athletes at AIS with medium level of Plasma Ferritin concentration is within 0.06, what minimum sample size is required? For calculations, use an estimated proportion from the given data. (g) (4 marks) If the doctor decides to use a conservative method (approach), what will be the minimum sample size to keep the same level of confidence and margin of error as in part (f)? What is the impact on the sample size of this decision? (Include evidence to support your answer). Question 3 (16 marks) In this question, consider the data on the body mass index at entry (BMIEntry) and current body mass index (BMI) of the athletes at AIS from the data set AIS.sav. The doctors at the AIS believe that the current body mass index of the athletes is less than their body mass index at entry. To find out if the mean current body mass index of the athletes is less than the mean body mass index at entry, they wish to perform appropriate statistical analyses. (a) (3 marks) State appropriate hypotheses (define any symbols used) to perform an appropriate statistical test. (b) (2 marks) State (but do not check) the assumptions required for the validity of the test. Describe the assumptions in the context of the study. (c) (3 marks) Without using SPSS, calculate the value of the appropriate test statistic to test the hypotheses in part (a). [You can use appropriate sample summary statistics from SPSS output for calculations.] STA2300 Data Analysis S2, 2019 4 (d) (2 marks) Using the appropriate statistical table provided in the StudyDesk, determine the P- value of the above test. (e) (3 marks) Based on the P-value describe the outcome of the test in the context of the study. (f) (3 marks) Now use SPSS to carry out the test. Copy and paste the relevant SPSS output to your assignment solution. Do these results agree with those found in part (e)? (Hint: comment on the test statistic, P-value and conclusion.) Question 4 (20 marks) Use the information on the BMI and Gender of the athletes in the data set AIS.sav to answer the following questions. You should use SPSS to calculate any sample statistics you will need to do this question, but for part (e) you are required to do the rest of the calculations by hand, using a calculator. The doctors at AIS wish to check if the mean BMI of the female athletes is lower than that of the male athletes. (a) (4 marks) Using SPSS produce an appropriate graph to compare the distribution of BMI of the female and male athletes. Label the axes correctly, include unit of measure and provide an appropriate title which includes your name. (b) (2 marks) Using the graph produced in part (a), briefly describe the distribution of BMI for the two groups of (male and female) athletes. Features discussed should include the shape, centre, spread and outliers, if any. (c) (3 marks) State appropriate hypotheses (defining all symbols) to answer the question: ‘Is the mean BMI of female athletes is less than that of the male athletes?’ (d) (2 marks) State (but do not check) the assumptions required for the validity of the test. Describe the assumptions in the context of the study. (e) (2 marks) Without using SPSS, calculate the value of the appropriate test statistic for testing the hypotheses in part (c). [You can use appropriate sample statistics from SPSS output for calculations.] (f) (4 marks) Using the appropriate statistical table provided in the StudyDesk, find the P-value of the test, and describe the outcome of the test in the context of the study. (g) (1 mark) Now use SPSS to check your results for the above hypothesis test. Copy and paste the relevant output from SPSS for this test into your assignment. (h) (2 marks) Briefly comment on how the test statistic and P-value from SPSS output are similar to or differ from your hand calculations. STA2300 Data Analysis S2, 2019 5