it is a same assignment as the previous one of order number 26924 just the data will be different as the questions and need to be done in ibm spss as before u can check all the details for before as well and i will also provide u the data file as well.
STA2300 Data Analysis S1, 18 1 Assignment 2 Due Date: 1 May, 2018 Weighting: 20% Full Marks: 100 Answering the questions in this assignment should not be your first attempt at these types of questions. It is essential that you work through practice exercises from the tutorial sheets in the Study Book and/or Text Book first. This assignment is important in checking your knowledge, providing feedback and helping to establish competency in essential skills. Answer all the questions. The questions are not of equal weight; some questions are worth much more than the others. The questions relate to materials in Modules 1 to 6. Before starting this assignment read Notes Concerning Assignments under the Introductory Material link in the ‘Getting started’ tab on the StudyDesk. When you are asked to comment on a finding, usually a short paragraph is all that is required. Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many cases the SPSS output contains much more information than is required for a correct and complete answer. In those cases just reproducing the output may not attract any marks. Make sure you report only the information from the SPSS output relevant to your answer. In order to obtain full marks for any question you must show all working. Convert your word document to pdf before submitting your assignment via the link on the StudtDesk. See the Introductory Material (Section 5, Assignments) for information about how to do this properly. This assignment consists of 6 questions. STA2300 Data Analysis S1, 18 2 Question 1 (14 marks) This question uses information from the data file DHS18.sav found under the Assessment tab on the StudyDesk (also see DHS18.txt for more details about the survey and the variables measured). Make sure the Variable View in SPSS is setup properly with all ‘labels’ correctly defined (with units), all ‘values’ assigned correctly for categorical variables and the correct ‘measure’ selected for all variables. A researcher is interested to know if wealth of women is associated with the level of their educational qualifications. (a) (4 marks) Use a contingency table to display the relationship between ‘Wealth status’ and ‘Education level’ for the women in this survey (you should use SPSS to produce this contingency table). The title for this table should reflect the context of the study. (Note that by convention, a table title should appear above the table). (b) (2 marks) What proportion of women are ‘Poorest’ and have ‘No Education’? (c) (2 marks) Of those who are ‘Richest’, what proportion have ‘Higher’ education? (d) (6 marks) Does there appear to be an association between ‘Wealth status’ and the ‘Education level’ for women in this developing country? Explain in less than 100 words, using a numerical example(s) from a conditional distribution table to support your conclusion. Question 2 (20 marks) Consider the data in the file DHS18.sav again. Use SPSS to find the answers to the following questions, but do not copy and paste SPSS output into your answer for parts (c) and (d) (make sure you always include units where appropriate). (a) (5 marks) Display the distribution of ‘Weight’ of the women in 2011 from this survey using an appropriate graph. Label the axes correctly, include units of measure and provide an appropriate title. (b) (4 marks) Using the graph produced in part (a) only (don’t refer to SPSS summary statistics), describe in no more than 60 words, the distribution of ‘Weight’ of the women, from this survey. Include comments on shape, centre and spread of the distribution and the existence of outliers and/or gaps, if any. Do not perform any calculations; use the graph only. (c) (3 marks) What is the sample size, mean and standard deviation of the distribution of ‘Weight’ of the women in 2011, from this survey? (You can use SPSS to calculate them but do not copy/paste SPSS output). (d) (4 marks) Using SPSS find the median, first quartile, third quartile and IQR of the distribution of ‘Weight’ of the women in 2011, from this survey. (Do not copy/paste SPSS output). STA2300 Data Analysis S1, 18 3 (e) (4 marks) For the distribution of ‘Weight’ of the women in 2011, which statistics are appropriate to measure its centre and spread? Give a reasonable explanation for your choice. Question 3 (12 marks) Use this extract taken from the article, “Garlic juice and moderate physical activity increase the level of kidney function of CKD patients,” (appeared in Kidney Research on December 31, 2017) to answer the questions that follow: Kidney disease is called a ‘silent disease’ as there are often few or no symptoms. In fact, 90% of kidneys can be damaged without observing any symptoms. Nowadays, Chronic Kidney Disease (CKD) is considered as one of the major public health problems worldwide. It is a chronic disorder in which a person has a low glomerular filtration rate (GFR). A GFR level of 44 to 30 is considered as moderate to severe loss of kidney function. A recent study by researchers at the US Kidney Research Centre and Oklahoma University School of Public Health investigated the effects of garlic juice and moderate physical activity on moderate to severe kidney disease in patients in the US. A double-blinded, randomized, placebo-controlled trial was conducted with 120 moderate-to- severely-affected kidney patients. Randomization was stratified by gender. Patients were randomly assigned to one of four groups. Each group consisted of 15 men and 15 women. The first group was assigned to receive 50 grams of garlic juice and required to participate in moderate physical activity (30 minute walk) daily, the second group was given 50 of grams garlic juice daily, the third group was required to undertake a 30 minute walk daily, and the last group was not given any intervention. After fifteen weeks of intervention, it was found that the GFR of the combined garlic juice and physical activity group was significantly lower compared with the other groups. The researchers also found a greater reduction in BMI, systolic and diastolic blood pressure in the mixed group compared with garlic-only, physical-activity-only and control groups. (a) (2 marks) Is this an experimental or observational study? In less than 50 words clearly explain your choice based on the extract given above. (b) (3 marks) For the above study identify, if appropriate, the i) response variable(s). ii) factor and its levels. iii) sample size. (c) (4 marks) Are the four principles of experimental design used in this study? Explain, in the context of the study? (d) (3 marks) Explain explicitly what a confounding variable is. Identify one plausible confounding variable in this study and explain why it is a confounding variable. Question 4 (12 marks) Recent research shows that the distance from home to the nearest health service facility is an STA2300 Data Analysis S1, 18 4 important factor in the control of a number of diseases in developing countries. Based on historical data (not the sample data in DHS18.sav) in Bangladesh, the distance from home to the nearest health service facility in rural areas is approximately normally distributed with a mean of 9.5 kms and a standard deviation of 1.5 kms. (a) (2 marks) Identify the variable of interest and the unit of measurement of this variable. (b) (3 marks) Based on the above normal distribution, for what proportion of rural dwellers in Bangladesh is the distance from home to the nearest health service facility 11 kms or more? (c) (4 marks) Based on the above normal distribution, for what proportion of Bangladeshi rural dwellers is the nearest health service facility between 7 and 10 kms from home? (d) (3 marks) Based on the above normal distribution, below what distance are the closest 5% of Bangladeshi rural dwellers to their nearest health service facility? Question 5 (24 marks) Consider the data in the file DHS18.sav again. Given that it is believed that ready access to health services can have a marked impact on the wellbeing of people in developing countries, a researcher is interested in identifying if distance from home to the nearest health service facility of women can be used to predict the weight of women based on information collected in this survey in 2011. (a) (2 marks) What are the two variables the researcher will need to include in the analysis? What type of variables are they? (b) (4 marks) Use an appropriate graph to display the relationship between the two variables identified in part (a). Label the axes correctly, include units of measure and provide an appropriate title. (c) (4 marks) From the graph in part (b), describe (in no more than 30 words) the form, direction and scatter of this relationship, and identify any outliers. (d) (4 marks) Calculate an appropriate statistic to measure the strength and direction of the relationship between the two variables for these women. Interpret this statistic. (e) (6 marks) Use SPSS output to write the equation of the regression line which could be used to make this prediction and then plot the regression line on the graph in part (b). (f) (3 marks) Using the regression equation from part (e), predict the expected weight of women whose nearest health service facility is 10 km from their home. Would you consider this to be an accurate prediction? Why? (g) o(1 mark) What proportion of the variability in weight of women can be explained by the model, i.e. the relationship between weight and distance from home to the nearest health STA2300 Data Analysis S1, 18 5 service facility