all the requirements have been specified there in the questions, some need to be answered from data which i would provide you in next file and some need to be done by calculations. IBM SPSS need to be used to answer the questions with the help of data.
STA2300 Data Analysis S1, 18 1 Assignment 3 Due Date: 29 May, 2018 Weighting: 25% Full Marks: 100 Answering the questions in this assignment should not be your first attempt at these types of questions. It is essential that you work through practice exercises from the tutorial sheets and Text Book first. This assignment is important in checking your knowledge, providing feedback and helping to establish competency in essential skills. Answer all the questions. The questions are not of equal weight; some questions are worth much more than others. The questions relate to material up to and including Module 10. Before starting this assignment read Notes Concerning Assignments under the Introductory Material link on the StudyDesk. When you are asked to comment on a finding, usually a short paragraph is required. Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many cases the SPSS output contains much more information than is required for a correct and complete answer. In those cases just reproducing the output may not attract any marks. Make sure you report only the information from the SPSS output relevant to your answer. Unless instructed otherwise, show all working and formulae used in calculating confidence intervals and performing hypothesis tests. (Answers may of course be checked where possible using SPSS). In order to obtain full marks for any question you must show all working. Submission is via the link on the StudyDesk. This assessment item consists of 5 questions. STA2300 Data Analysis S1, 18 2 Requirements for a passing grade: As you may have seen in the Course Specification, to receive a passing grade you must achieve at least 40% (i.e., 20/50) of the marks available in the final examination, and at least 50% of the total weighted marks available for the course. If you get over 50% weighted marks in all assessments, but did not get at least 40% marks in the final exam, you will not pass the course. Also, if you get at least 40% marks in the final exam but do not get at least 50% of weighted marks in the course, you will not pass the course. Note on Assignment 3 Solutions, Marks and Late Submissions: Because of the timing of Assignment 3, marks for this assignment will not be available until after the exam. However, feedback for this assignment will be available in the form of comprehensive worked solutions before the exam via the StudyDesk. As a result, any Assignment 3 submitted after 5pm AEST on the Friday before the exam period will not receive any marks. Question 1 (25 marks) Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS to calculate the sample statistics you will need to do this question, but for parts (a) and (d) you are required to do the rest of the calculations by hand, using a calculator. (a) (7marks) Estimate the population mean weight of women with no education in 2011, using a 99% confidence interval (show all working). Make sure you ONLY select women who have no education. (b) (6marks) Check the appropriate conditions and assumptions needed for the validity of the confidence interval or hypothesis test for the population mean weight of women with no education (include an appropriate graph to support your answer). (c) (3 marks) From historical data, a researcher knows that the average weight of women in developing countries who have no education is 52.5 kg. State appropriate hypotheses (define any symbols used) to perform a hypothesis test to see if there is evidence to support her suspicion, based on the data in this study, that the average weight of women in developing countries in 2011 who have no education is greater than the historical value (regardless of whether the conditions in part (b) are satisfied). (d) (2 marks) Calculate the value of a suitable test statistic for the test in part (c) (e) (4 marks) Find the P-value of the test, based on the test statistic calculated in part (d), and write a meaningful conclusion at the 1% level of significance. (f) (3 marks) Now, check your answers for parts (d) and (e) by finding the value of the test statistic and the P-value using SPSS. Include SPSS output in your answer and comment on the comparison with the hand calculated values. Explain any differences. STA2300 Data Analysis S1, 18 3 Question 2 (27 marks) Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS to calculate any sample statistics you will need to do this question, but for parts (d)-(g) you are required to do the rest of the calculations by hand, using a calculator and statistical tables. According to the Bureau of Statistics in the developing country being surveyed, 5% of women were ‘higher educated’ before 2011. The researcher believes that the proportion of all women in the developing country with such qualifications was no longer 5% in 2011. (a) (1 mark) What is the variable of interest to the researcher? (b) (3 marks) State the appropriate hypotheses (define any symbols used) to test the researcher’s claim that the proportion of women who are ‘higher educated’ in 2011 was no longer 5%. (c) (4 marks) Check the conditions and assumptions for the test in part (b). (d) (4 marks) Calculate the test statistic for the test in part (b) (e) (8 marks) Find the P-value for the test in part (d) and write a meaningful conclusion in the context of this situation. (f) (4 marks) If the researcher wants to be 99% confident that the margin of error of the estimate of the true proportion of women who are ‘higher educated’ is within 0.06, what minimum sample size is required? Use a conservative method in determining the sample size. (g) (3 marks) The researcher decides that instead of using a conservative method (as required in part (f)), she will use information obtained from the DHS18.sav data to decide how many women she would need to survey (keeping the same level of confidence and margin of error). What is the impact of this decision? (Include evidence to support your answer). Question 3 (16 marks) Use the information in the dataset DHS18.sav again. The systolic blood pressure (BP) of the women was measured in 2011 and in a follow-up in 2014. The researcher wants to know, if, on average, the systolic BP of the poorest women in 2014 is significantly greater than the systolic BP of the same cohort in 2011. Make sure you select ONLY poorest women. (a) (3 marks) State appropriate hypotheses (define any symbols used). (b) (2 marks) State (but do not check) the assumptions for carrying out this test. Describe the assumptions in the context of this question. (c) (2 marks) Without using SPSS, calculate the value of a suitable test statistic for this test. You can use SPSS for calculating appropriate sample statistics. STA2300 Data Analysis S1, 18 4 (d) (3 marks) Without using SPSS, calculate the P-value of this test. (e) (3 marks) Interpret the P-value and describe the outcome of the test in the context of this question. (f) (3 marks) Now use SPSS to carry out the analysis. Copy and paste the relevant SPSS output to your assignment solution. Do these results agree with those found in part (e)? (Hint: comment on the P-value). Question 4 (20 marks) Use the information in the dataset DHS18.sav to answer the following questions. You should use SPSS to calculate any sample statistics you will need to do this question, but for part (e) you are required to do the rest of the calculations by hand, using a calculator. The researcher is concerned that the weight of women in 2014 depends on their wealth. She believes that the average weight of ‘poorer’ women is greater than that of the ‘richer’ women in this developing country. (a) (4 marks) Use an appropriate graph to compare the distribution of weight of ‘poorer’ women with that of ‘richer’ women. Label the axes correctly, include a unit of measure and provide an appropriate title. Make sure you select ONLY ‘richer’ and ‘poorer’ women. (b) (2 marks) Using the graph produced in part (a), briefly describe the distribution of weight for the two groups of women (poorer and richer). (c) (3 marks) State appropriate hypotheses (defining all symbols) to answer the question: ‘Is the average weight of women greater for all ‘poorer’ women compared to all ‘richer’ women in this developing country in 2011?’ (d) (2 marks) Check the assumptions for carrying out the test in part (c). (e) (2 marks) Without using SPSS, calculate a suitable test statistic for the test in part (c). (f) (4 marks) Without using SPSS, find the P-value of the test. Interpret the P-value and describe the outcome of the original question. (g) (1 mark) Now use SPSS to check your results for this hypothesis test. Copy and paste the relevant output from SPSS for this test into your assignment. (h) (2 marks) Briefly comment on how the test statistic and P-value from SPSS output are similar to or differ from your hand calculations. STA2300 Data Analysis S1, 18 5 Question 5 (12 marks) Give a brief answer to each of the following six (6) questions: (a) (2 marks) State the differences between convenience sampling and cluster sampling. (b) (2 marks) Explain the difference between a Type 1 and