assignment attached
1 STAT1060 – Assignment 2 Semester 2, 2020 STATISTICAL ANALYSIS TO SUPPORT DECISION MAKING Total marks: 46, Weight: 20% Due: November 8, 23.59 PM (Sunday of Week 12) Submission instructions and general marking criteria Submit a copy in Word or PDF format via Turnitin. Assignments submitted by other means (e.g. email) or forms (scanned copy, Excel document) will attract no marks. Late Submission Penalty: As detailed in the Course Outline. It is expected that Excel is used to assist with calculations and preparation of appropriate graphs. All relevant Excel output should be included with your assignment. However, raw computer output without explanatory text is not acceptable. Answers must be written in clear English sentences clearly linked to appropriate supporting computer output. Only extract the parts of the Excel output that are relevant to answering the question should be included and output not relevant should not be included. You will need to demonstrate understanding of types of data, the use of graphs to explore distributions of variables and relationships between variables, and of statistical tests of such relationships. Marks will be awarded based on the quality of your assessment of the data and how clearly that assessment is communicated. The assessment requires you to apply concepts from Weeks 5 to 10 to a specific scenario and to apply the correct analysis to the various scenarios/data sets and to write up the results of a statistical analysis. Question 1 (6 Marks) Car buyers in the city of Newcastle were asked by a car dealer to rate their level of satisfaction with the service that they received. The four possible ratings were: Excellent (E), Good (G), Satisfactory (S) and Unsatisfactory (U). Data showing the level of satisfaction with the service for December 2019, is provided in the file Question1.xlsx.Column A contains the level of satisfaction scores. (i) What type of variable is “level of satisfaction” (Continuous, Discrete, Ordinal or Nominal)? Be sure to justify your answer. [2 Marks] (ii) Name the appropriate graphical display to use to display ‘level of satisfaction’, based on the variable type you identified in part (i). [1 Mark] (iii) Use Excel to create the appropriate graph to display the provided data. [1 Mark] (iv) Comment on the key aspects from this graph. [2 Marks] Question 2 (4 Marks) Commuting times of students who travel by bus to the University of Newcastle Callaghan campus are known to be normally distributed with a mean of 25 minutes and standard deviation 5 minutes. Use the empirical rule of the normal distribution to answer the following questions. Shade in (roughly) the corresponding area under the curve for each part (i-iv). (i) What is the probability that the student’s commuting time is less than 20 minutes to reach campus? [1 Mark] (ii) What is the probability that the student’s commuting time is more than 35 minutes to reach 2 campus? [1 Mark] (iii) What is the probability that the student’s commuting time is between 15 and 30 minutes to reach campus? [1 Mark] (iv) Find the bus commuting time that corresponds to the 97.5 % percentile (approximately) of the distribution (i.e. find the commuting time above which only 2.5% of the distribution appears). [1 Mark] Question 3 (6 Marks) It has been found from experience that the mean commuting time to reach the University of Newcastle Callaghan campus is approximately 25 minutes. A sample of 100 students selected at random from the University of Newcastle Callaghan campus reveals a sample mean commuting time of 26.32 minutes with the sample standard deviation of 2.78 minutes. Perform a hypothesis test, at the 5% significance level, to test if the population mean commuting time is different from 25 minutes. Use the output presented in Figure 1 to answer the question. Figure 1: t-Test: Two-Sample Assuming Unequal Variances Commuting time Dummy Mean 26.3231 25 Variance 7.7202 0 Observations 100 2 Hypothesized Mean Difference 0 df 99 t Stat 4.7618 P(T<=t) one-tail="" 0.0000="" t="" critical="" one-tail="" 1.6604="">=t)><=t) two-tail="" 0.0001="" t="" critical="" two-tail="" 1.9842="" question="" 4="" (14="" marks)="" the="" university="" of="" newcastle="" marketing="" services="" are="" interested="" in="" mobile="" usage="" technology.="" a="" study="" was="" undertaken="" in="" which="" a="" random="" sample="" of="" students="" enrolled="" at="" the="" university="" of="" newcastle="" in="" 2019="" and="" 2020="" were="" invited="" to="" participate="" in="" a="" project="" about="" the="" daily="" usage="" of="" smartphones.="" there="" were="" 50="" randomly="" selected="" students="" in="" 2019="" and="" 50="" randomly="" selected="" students="" in="" 2020="" who="" participated="" in="" the="" study.="" data="" is="" provided="" in="" the="" excel="" file="" question4.xlsx.="" columns="" a="" contains="" student="" smartphone="" daily="" usage="" in="" 2019="" and="" column="" b="" contains="" student="" smartphone="" daily="" usage="" in="" 2020="" (minutes="" per="" day).="" (i)="" use="" excel="" to="" construct="" a="" histogram="" of="" student="" smartphone="" daily="" usage="" in="" 2019.="" how="" would="" you="" describe="" the="" distribution,="" including="" the="" shape?="" include="" a="" histogram="" with="" your="" response.="" [2="" marks]="" (ii)="" use="" excel="" to="" construct="" a="" histogram="" of="" student="" smartphone="" daily="" usage="" in="" 2020.="" how="" would="" you="" 3="" describe="" the="" shape?="" include="" a="" histogram="" with="" your="" response.="" [2="" marks]="" (iii)="" use="" excel="" to="" find="" the="" mean,="" median,="" standard="" deviation="" and="" interquartile="" ranges="" of="" student="" smartphone="" daily="" usage="" in="" 2019.="" repeat="" for="" the="" smartphone="" daily="" usage="" in="" 2020.="" [2="" marks]="" (iv)="" using="" the="" information="" from="" parts="" (i)-(iii),="" give="" a="" brief="" statement="" comparing="" smartphone="" daily="" usage="" in="" 2019="" with="" smartphone="" daily="" usage="" in="" 2020.="" [2="" marks]="" (v)="" explain,="" using="" support="" from="" a="" hypothesis="" test,="" if="" the="" average="" smartphone="" daily="" usage="" in="" 2019="" differs="" from="" the="" average="" smartphone="" daily="" usage="" in="" 2020.="" perform="" the="" hypothesis="" test="" at="" the="" 5%="" significance="" level.="" assume="" that="" data="" in="" 2019="" and="" 2020="" were="" taken="" from="" different="" students.="" be="" sure="" to="" include="" the="" following="" in="" your="" answer:="" ="" the="" null="" and="" alternative="" hypotheses="" ="" the="" p-value="" ="" conclusion="" (hint):="" research="" question:="" was="" the="" average="" smartphone="" usage="" the="" same="" in="" 2019="" and="" 2020?="" [6="" marks]="" question="" 5="" (6="" marks)="" a="" jb="" hi-fi="" franchise="" in="" newcastle="" set="" a="" discount="" pricing="" strategy="" with="" the="" aim="" to="" increase="" the="" sales="" of="" smartphones.="" the="" difference*="" (with="" promotion="" and="" without="" promotion)="" in="" sales="" of="" new="" smartphone="" is="" defined="" as:="" difference*="Sales" on="" days="" with="" promotion="" –="" sales="" on="" matched="" days="" without="" promotion="" test="" the="" hypothesis="" of="" whether="" there="" is="" a="" difference,="" at="" the="" 5%="" significance="" level,="" between="" the="" mean="" difference="" in="" sales="" of="" new="" smartphones="" (between="" days="" with="" promotion="" and="" days="" without="" promotion).="" comment="" on="" the="" result.="" use="" figure="" 2="" to="" answer="" the="" research="" question.="" note:="" both="" sets="" of="" days’="" measurements="" were="" taken="" from="" the="" same="" jb="" hi-fi="" franchise.="" figure="" 2:="" t-test:="" two-sample="" assuming="" unequal="" variances="" difference="" dummy="" mean="" 14.4873="" 0="" variance="" 102.1922="" 0="" observations="" 50="" 2="" hypothesized="" mean="" difference="" 0="" df="" 49="" t="" stat="" 10.1336="">=t)><=t) one-tail="" 0.0000="" t="" critical="" one-tail="" 1.6766="">=t)><=t) two-tail 0.0001 t critical two-tail 2.0096 4 question 6 (10 marks) a sample of 450 students was selected from the university of newcastle to determine if there is a relationship between smoking status and student’s diet (vegetarian and non-vegetarian). use figure 3 to answer the following probability questions (i) – (iv). figure 3: observed frequencies vegetarian non-vegetarian total smoker 18 25 43 non-smoker 97 310 407 total 115 335 450 (i) what is the probability that a randomly selected student is vegetarian? [1 mark] (ii) what is the probability of a randomly selected student being “vegetarian and a non-smoker”? [1 mark] (iii) what is the probability of a student being “vegetarian or a smoker”? [1 mark] (iv) what is the probability that a randomly selected student is a non-smoker given that the student is non-vegetarian? [1 mark] figure 4: multiple bar chart for smoking status and diet of students figure 5: expected frequencies vegetarian non-vegetarian total smoker 10.99 32.01 43 non-smoker 104.01 *** 407 total 115 335 450 (v) in figure 5, calculate the expected count for the empty cell corresponding to the non-vegetarians who are non-smokers. note that this cell corresponds to the observed count of 310 in figure 3. [1 mark] (vi) conduct an appropriate hypothesis test at the 5% significance level to determine if there is a statistically significant relationship between smoking status and student’s diet. be sure to report at least the p-value and use this to answer the research question. the p-value for the test statistic is given as 0.010. [5 marks] 18 97 25 310 0 50 100 150 200 250 300 350 smokers non-smokers vegetarian non-vegetarian two-tail="" 0.0001="" t="" critical="" two-tail="" 2.0096="" 4="" question="" 6="" (10="" marks)="" a="" sample="" of="" 450="" students="" was="" selected="" from="" the="" university="" of="" newcastle="" to="" determine="" if="" there="" is="" a="" relationship="" between="" smoking="" status="" and="" student’s="" diet="" (vegetarian="" and="" non-vegetarian).="" use="" figure="" 3="" to="" answer="" the="" following="" probability="" questions="" (i)="" –="" (iv).="" figure="" 3:="" observed="" frequencies="" vegetarian="" non-vegetarian="" total="" smoker="" 18="" 25="" 43="" non-smoker="" 97="" 310="" 407="" total="" 115="" 335="" 450="" (i)="" what="" is="" the="" probability="" that="" a="" randomly="" selected="" student="" is="" vegetarian?="" [1="" mark]="" (ii)="" what="" is="" the="" probability="" of="" a="" randomly="" selected="" student="" being="" “vegetarian="" and="" a="" non-smoker”?="" [1="" mark]="" (iii)="" what="" is="" the="" probability="" of="" a="" student="" being="" “vegetarian="" or="" a="" smoker”?="" [1="" mark]="" (iv)="" what="" is="" the="" probability="" that="" a="" randomly="" selected="" student="" is="" a="" non-smoker="" given="" that="" the="" student="" is="" non-vegetarian?="" [1="" mark]="" figure="" 4:="" multiple="" bar="" chart="" for="" smoking="" status="" and="" diet="" of="" students="" figure="" 5:="" expected="" frequencies="" vegetarian="" non-vegetarian="" total="" smoker="" 10.99="" 32.01="" 43="" non-smoker="" 104.01="" ***="" 407="" total="" 115="" 335="" 450="" (v)="" in="" figure="" 5,="" calculate="" the="" expected="" count="" for="" the="" empty="" cell="" corresponding="" to="" the="" non-vegetarians="" who="" are="" non-smokers.="" note="" that="" this="" cell="" corresponds="" to="" the="" observed="" count="" of="" 310="" in="" figure="" 3.="" [1="" mark]="" (vi)="" conduct="" an="" appropriate="" hypothesis="" test="" at="" the="" 5%="" significance="" level="" to="" determine="" if="" there="" is="" a="" statistically="" significant="" relationship="" between="" smoking="" status="" and="" student’s="" diet.="" be="" sure="" to="" report="" at="" least="" the="" p-value="" and="" use="" this="" to="" answer="" the="" research="" question.="" the="" p-value="" for="" the="" test="" statistic="" is="" given="" as="" 0.010.="" [5="" marks]="" 18="" 97="" 25="" 310="" 0="" 50="" 100="" 150="" 200="" 250="" 300="" 350="" smokers="" non-smokers="" vegetarian="">=t) two-tail 0.0001 t critical two-tail 2.0096 4 question 6 (10 marks) a sample of 450 students was selected from the university of newcastle to determine if there is a relationship between smoking status and student’s diet (vegetarian and non-vegetarian). use figure 3 to answer the following probability questions (i) – (iv). figure 3: observed frequencies vegetarian non-vegetarian total smoker 18 25 43 non-smoker 97 310 407 total 115 335 450 (i) what is the probability that a randomly selected student is vegetarian? [1 mark] (ii) what is the probability of a randomly selected student being “vegetarian and a non-smoker”? [1 mark] (iii) what is the probability of a student being “vegetarian or a smoker”? [1 mark] (iv) what is the probability that a randomly selected student is a non-smoker given that the student is non-vegetarian? [1 mark] figure 4: multiple bar chart for smoking status and diet of students figure 5: expected frequencies vegetarian non-vegetarian total smoker 10.99 32.01 43 non-smoker 104.01 *** 407 total 115 335 450 (v) in figure 5, calculate the expected count for the empty cell corresponding to the non-vegetarians who are non-smokers. note that this cell corresponds to the observed count of 310 in figure 3. [1 mark] (vi) conduct an appropriate hypothesis test at the 5% significance level to determine if there is a statistically significant relationship between smoking status and student’s diet. be sure to report at least the p-value and use this to answer the research question. the p-value for the test statistic is given as 0.010. [5 marks] 18 97 25 310 0 50 100 150 200 250 300 350 smokers non-smokers vegetarian non-vegetarian>