Data-based critical thinking
STA1DCT Assignment 2 (2020)Name:Student Number: When preparing your submission, you should delete the italicised text (including this instruction). Type (or write) you statement of originality here. Question 1: Expression 1 = …………………………………… Expression 2 = …………………………………… Expression 3 = …………………………………… Expression 4 = …………………………………… Expression 5 = …………………………………… Question 2: Answer here Answer here Answer here (i) Answer here (ii) Answer here (iii) Answer here (iv) Answer here Question 3: Answer here Answer here Answer here Question 4: Using the below table will help formatting your answer easier. Histogram A: average = standard deviation = Histogram B: average = standard deviation = Histogram C: average = standard deviation = Histogram D: average = standard deviation = Question 5: Answer here Question 6: Answer here STA1DCT Assignment 2 Assignment 2 is due no later than 5pm Thursday the 2nd of April, 2020. You must submit your assignment electronically and as a single file via the LMS page for this subject. Where appropriate, your solutions must include your workings. In submitting your work, you are consenting that it may be copied and transmitted by the University for the detection of plagiarism. Please start with the following statement of originality, which must be included near the top of your submitted assignment: “This is my own work. I have not copied any of it from anyone else.” IMPORTANT NOTE 1: The total possible marks for this assignment is 50. There are 40 marks associated with accuracy (i.e. correctness of your answers; the breakdown of these marks is indicated on this question sheet), a further five marks for completeness (you will only get the full five marks for completeness if you make a serious attempt to answer every question) and a further five marks for your written communication (e.g. clarity, spelling, grammar, correct use of notations etc.) STA1DCT: 40 + 5 + 5 = 50 marks. IMPORTANT NOTE 2: When you are asked to calculate an answer by hand, you may still use your calculator for basic calculations (e.g. multiplication, division, taking a square-root etc.) Your workings should show that you know how a formulae or process works. 1. Consider n = 2 values denoted x1, x2. Throughout this question let x = 1 2 2∑ i=1 xi denote the average of these values. For this question, you need to match the expression in the left column below with the equivalent expression in the right hand column. For convenience, the expressions on the left are labeled with numbers (e.g. Expression 1, Expression 2, . . .) and the expressions on the right with letters (e.g. Expression A, Expression B, . . .). Expression 1: 2∑ i=1 xi Expression A: x+ 1 Expression 2: 1 2 2∑ i=1 (xi + 2) Expression B: −x Expression 3: 2∑ i=1 (xi − x) − x Expression C: 2x Expression 4: 1 2 ( 2∑ i=1 xi + 2 ) Expression D: x+ 12 Expression 5: 1 2 2∑ i=1 xi + 1 2 Expression E: x+ 2 To make your answer clear, you are advised to use a format of answer similar to that below. Simply replace the . . . . . . . . . . . . . . . with the name of the correct matching expression from the right hand column. Expression 1 = . . . . . . . . . . . . . . . Expression 2 = . . . . . . . . . . . . . . . Expression 3 = . . . . . . . . . . . . . . . Expression 4 = . . . . . . . . . . . . . . . Expression 5 = . . . . . . . . . . . . . . . (3 marks all correct; 2 marks 3 correct; 1 mark 2 correct; 0 otherwise) 1 2. The birth weight of 322 rat-pups were recorded and are stored in the Excel file called Birth Weight.xlsx which can be downloaded from LMS. The data consists of a single column with the heading “Birth Weight”. You are required to use Excel to answer the questions below and we will treat this data as population data for this question. (a) Calculate the population average of the birth weights provided in the Excel file. What is the population average and what was the Excel formula that you used to calculate it? (3 marks) (b) Calculate the population median of the birth weights provided in the Excel file. What is the population median and what was the Excel formula that you used to calculate it? (3 marks) (c) Calculate the population standard deviation of the birth weights provided in the Excel file. What is the population standard deviation and what was the Excel formula that you used to calculate it? (3 marks) (d) The empirical rule in statistics states, that if the data in the population is approximately normally dis- tributed then approximately 68% of the values in a population should fall within the interval µ±σ (where µ and σ are the population average and population standard deviation respectively). (i) If the birth weights of the rat-pups are approximately normally distritbuted then how many of the 322 values would you expect to fall within the interval µ±σ? Explain how you arrived at your answer. (1 mark) (ii) How many of the 322 birth weights in the Excel file fall within the interval µ ± σ? Explain how you arrived at your answer. NOTE: there are several ways to do this. If you are not sure how then try searching the internet which has many forums etc. that can be helpful for problems such as this. (2 marks) 2 (iii) What percentage of the 322 birth weights fall within the interval µ ± σ (round to 2 decimal places)? (1 mark) (iv) Do you think there is evidence to suggest that the birth weights of the rat-pups are approximately normally distributed? Explain. 3. Consider again the data considered in the previous question. For this question we will consider a random sample of six birth weights that have been randomly chosen from the population data provided in the Excel file. The randomly chosen data values that you should use for this question are 5.96, 5.84, 5.32, 5.52, 6.42 and 6.34. (a) Calculate by hand (i.e. without using Excel) the sample average. As well as providing your answer, you must also provide your workings and these must be clear and accurate. (2 marks) (b) Calculate by hand (i.e. without using Excel) the sample median. As well as providing your answer, you must also provide your workings and these must be clear and accurate. (3 marks) 3 (c) Calculate by hand (i.e. without using Excel) the sample standard deviation. As well as providing your answer, you must also provide your workings and these must be clear and accurate. (4 marks) 4. Four histograms are displayed on the next page. These histograms are labelled, in order from top to bottom, Histogram A, Histogram B, Histogram C and Histogram D respectively. Refer to these histograms when answering this question. H istogram A H istogram B H istogram C H istogram D −50 0 50 100 0 20 40 0 25 50 75 0 50 100 150 200 0 20 40 60 Values F re qu en cy • In random order, the averages of the data displayed in each of the histograms are 50, 20, 40 and 60. • Similarly and again in a random order, the standard deviations of the data displayed in each of the histograms are 5, 15, 10 and 25. Your task is to study the histograms and then to match each histogram with its correct average and standard 4 deviation. Each average and standard deviation belongs to just one histogram. When you are confident with your answers then complete the below. Histogram A: average = . . . . . . . . . , standard deviation = . . . . . . . . . Histogram B: average = . . . . . . . . . , standard deviation = . . . . . . . . . Histogram C: average = . . . . . . . . . , standard deviation = . . . . . . . . . Histogram D: average = . . . . . . . . . , standard deviation = . . . . . . . . . (8 marks total: 1 mark each correct average + 1 mark each correct standard deviation) 5. On the 24th of February 2020, the Dow Jones Index was 27960.80 and on the 20th of March 2020 it was 19173.98. Calculate, by hand, the percentage decrease of the Dow Jones Index from the 24th of Feb 2020 to the 20th of March 2020. You must provide detailed workings below that show how YOU calculated the answer. Further, your percentage answer must be rounded to two decimal places. (4 marks) 6. Consider a sample of size n = 33. The data is summarised in the table below which includes five non-overlapping intervals and frequencies indicating the number of values in the data set that fall within each interval. Interval (bin) Frequency (0, 3] 2 (3, 6] 3 (6, 9] 5 (9, 12] 6 (12, 15] 17 In which interval (bin) does the median fall for the data depicted in the above table? Explain why you chose this interval. (3 marks) 5