Statistics and ProbabilityDataset 1 Assignment : Hypothesis Testing Type the last three digits...

Question

Statistics and ProbabilityDataset 1 		Assignment : Hypothesis Testing 		Type the last three digits of your student number in the green cell: 								18 		-8.3530608174 		18.8988101803		320 		Dataset 1 			5 	20		Groups			Frequencies 	0.1		300	to	305	6	0.2768276259 	0.05		305	to	310	10	4.4940491225 	50	0.24	310	to	315	35	28.5400756165 	0.05	0.03	315	to	320	81	71.6823966744 		0	320	to	325	82	71.6823966744 		0.19	325	to	330	38	28.5400756165 	-1.8172632862	0.4	330	to	335	14	4.4940491225 	-7.5233267779	0.55	335	to	340	10	0.2768276259 	-8.3530608174	0.78 	-3.1756107883	1 	2.6077628032 	6.6963797318 	12.8046821296 	18.8988101803 &"Helvetica Neue,Regular"&12&K000000&P	 Dataset 2 	-9.014781962	Assignment : Hypothesis Testing 	11.1416193246 	11.1497142468	Dataset 2 	20 	0.1	Part (a) 		195.41	206.38	202.46	192.86	199.80	199.28 	0.05	193.39	204.68	201.22	193.10	200.95	197.85 		193.09	203.03	202.54	195.15	203.07	196.72 		194.93	205.17	200.41	194.62	205.25	198.16 		196.98	207.46	198.35	195.17	205.79	198.81 		198.43	205.31	198.27	195.28	205.74	201.03 		200.59	203.61	196.06	197.41	203.93	200.54 		Part (b) 		202.75	201.64	195.85	199.48	201.73	202.58 		204.12	203.89	195.08	201.49	201.47	201.61 		202.64	196.85	200.48	202.73	203.58	205.12 		-1.8172632862	29.1387674812	18.0956543134	-9.014781962	10.5818141989	9.1122094651 		-7.5233267779	24.3651000834	14.5957038989	-8.3243532037	13.8121444978	5.0865386405 		-8.3530608174	19.6923930976	18.2972899917	-2.5403307968	19.7991713889	1.8930576982 		-3.1756107883	25.746101921	12.3100579381	-4.0363525288	25.9639453109	5.9560418874 		2.6077628032	32.2103902777	6.4917705195	-2.4948620843	27.4938290749	7.7920959447 		6.6963797318	26.1403345809	6.2627379668	-2.1893811499	27.3470539135	14.0369041652 		12.8046821296	21.3265749774	0.0169190264	3.8206025174	22.2283114332	12.673930057 		18.8988101803	15.7831295349	-0.5585680881	9.6706076095	16.0211337337	18.4266959733 		22.7791657667	22.1119003415	-2.748773327	15.3610172425	15.2782664576	15.6971105687 		29.1387674812	18.0956543134	-9.014781962	10.5818141989	9.1122094651	21.8150132581 &"Helvetica Neue,Regular"&12&K000000&P	 Dataset 3 	-9.014781962	Assignment : Hypothesis Testing 	11.1416193246 	11.1497142468	Dataset 3 	20 	0.1	0.75 	1500	List (a) 	10	1454.09	1563.76	1524.64	1428.59	1498.02	1492.81	1493.65 	0.05	1433.88	1546.85	1512.24	1431.04	1509.46	1478.55	1482.82 		1430.94	1530.29	1525.35	1451.53	1530.67	1467.24	1489.34 		1449.28	1551.74	1504.14	1446.23	1552.51	1481.63 		List (b) 		1469.02	1573.89	1482.78	1450.94	1557.18	1487.38	1494.57 		1483.50	1552.39	1481.97	1452.02	1556.66	1509.51	1506.01 		1505.14	1535.33	1459.84	1473.31	1538.53	1504.68	1494.09 		1526.73	1515.69	1457.80	1494.04	1516.54	1525.06	1505.98 		1540.48	1538.11	1450.04	1514.20	1513.90	1515.39 		-1.8172632862	29.1387674812	18.0956543134	-9.014781962	10.5818141989	9.1122094651 		-7.5233267779	24.3651000834	14.5957038989	-8.3243532037	13.8121444978	5.0865386405 		-8.3530608174	19.6923930976	18.2972899917	-2.5403307968	19.7991713889	1.8930576982 		-3.1756107883	25.746101921	12.3100579381	-4.0363525288	25.9639453109	5.9560418874 		2.6077628032	32.2103902777	6.4917705195	-2.4948620843	27.4938290749	7.7920959447 		6.6963797318	26.1403345809	6.2627379668	-2.1893811499	27.3470539135	14.0369041652 		12.8046821296	21.3265749774	0.0169190264	3.8206025174	22.2283114332	12.673930057 		18.8988101803	15.7831295349	-0.5585680881	9.6706076095	16.0211337337	18.4266959733 		22.7791657667	22.1119003415	-2.748773327	15.3610172425	15.2782664576	15.6971105687 &"Helvetica Neue,Regular"&12&K000000&P	 Dataset 4 	-9.014781962 	32.2103902777	Assignment : Hypothesis Testing 	20	Dataset 4 	0.1 	0.05		Resistance: 	10 	0.05		Motor running	Motor not running 		0.17	10.34	11.14	26.1403345809	0.85 		0.04	10.08	10.77	21.3265749774	0.74 	-1.8172632862	0.02	10.04	10.59	15.7831295349	0.6 	-7.5233267779	0.14	10.28	10.99	22.1119003415	0.76 	-8.3530608174	0.28	10.56	11.17	18.0956543134	0.66 	-3.1756107883	0.38	10.76	11.28	14.5957038989	0.57 	2.6077628032	0.53	11.06	11.67	18.2972899917	0.66 	6.6963797318	0.68	11.36	11.83	12.3100579381	0.52 	12.8046821296	0.77	11.54	11.87	6.4917705195	0.38 	18.8988101803	0.93	11.86	12.18	6.2627379668	0.37 	22.7791657667	0.81	11.62	11.79	0.0169190264	0.22 	29.1387674812	0.7	11.40	11.56	-0.5585680881	0.21 	24.3651000834	0.84	11.68	11.78	-2.748773327	0.15 	19.6923930976	1	12.00	11.95	-9.014781962	0 	25.746101921	0.85	11.70	11.67	-8.3243532037	0.02 	32.2103902777	0.74	11.48	11.59	-2.5403307968	0.16 	26.1403345809	0.6	11.20	11.27	-4.0363525288	0.12 	21.3265749774	0.76	11.52	11.63	-2.4948620843	0.16 	15.7831295349	0.22	10.44	10.56	-2.1893811499	0.17 	22.1119003415	0.22	10.44	10.70	3.8206025174	0.31 &"Helvetica Neue,Regular"&12&K000000&P	 Dataset 5 	-11.0535737647 	8.136993434	Assignment : Hypothesis Testing 	0.05 	20.5	Dataset 5		102 	0.95			0.7125 				5.9962476533 	50		Additive	Yield 	3.9164448683	1	65	60.17	3.3874906173	0.75 	0.95	0.7916474577	61.87	61.38	0.0993205493	0.58 	50.95	0.5810844501	58.72	62.73	-2.7636134576	0.43 	8.136993434	0.4346039573	56.52	62.55	-8.3151096747	0.14 	4.1385899702	0.1530895651	52.3	67.72	-1.3782243432	0.5 	0.0977664235	0.428575103	56.43	66.52	4.0914781569	0.79 	-2.7132773181	0.1790594026	52.69	68.34	1.4940008703	0.65 	-8.1156981787	0	50	70.5	2.1980947924	0.69 	-2.8289744512	0.1618127141	52.43	67.15	-3.0333019055	0.42 	-7.6173222659	0.1027851419	51.54	68.5	-0.7278652081	0.54 	-11.0535737647	0.3180958438	54.77	64.46	-6.2402115018	0.25 	-7.9482960012	0.2607666753	53.91	67.41	1.2228036141	0.64 	-9.0810685929	0.4139487966	56.21	64.03	-4.2575106605	0.35 	-4.9491340977	0.679637424	60.19	63.54	3.2105774449	0.74 	-6.0493133595 	-3.1096615662 	1.9890538914 &"Helvetica Neue,Regular"&12&K000000&P	 Dataset 6 		Assignment : Hypothesis Testing 	15	Dataset 6 	0.12 			G1	G2	G3 		A	7	16	9 		B	5	21	16 	School	C	11	25	14 		D	5	18	18 		E	9	17	11 		-1.3054474646 		12.1702751186	0.05	6.6164569609	1.8309878661 		13.4757225833	-1.3054474646	9.9266422428	6.1206863098 			3.3023623464	12.1702751186	2.9572361246 			-1.0961984596	8.1022489956	4.3447710858 			1.7372122919	5.361478885	0.5118962534 				1.8309878661 			0.9884516258 			1.4545989177 &"Helvetica Neue,Regular"&12&K000000&P	 Reference 			3 			18 		Student numbers	Last3		Seed value 		B00144204	204	0	0.05 		B00144224	224	0 		B00145476	476	0 		B00148935	935	0 		B00146479	479	0 		B00140662	662	0 		B00144463	463	0 		B00146837	837	0 		B00146309	309	0 		B00143219	219	0 		B00144085	85	0 		B00142353	353	0 		B00144414	414	0 		B00139800	800	0 		B00147347	347	0 		B00143307	307	0 		B00133044	44	0 		B00145110	110	0 		B00101967	967	0 		B00141464	464	0 		B00051570	570	0 		B00141650	650	0 		B00148882	882	0 		B00156304	304	0 		B00136276	276	0 		B00148488	488	0 		B00120585	585	0 		B00145295	295	0 		B00142300	300	0 		B00149346	346	0 		B00146534	534	0 		B00144448	448	0 		B00079233	233	0 		B00132032	32	0 		B00142458	458	0 		B00149056	56	0 		B00143582	582	0 		B00145439	439	0 		B00147196	196	0 		B00146627	627	0 		B00146455	455	0 		B00141814	814	0 		B00136322	322	0 		B00141901	901	0 		B00140724	724	0 		B00148328	328	0 		B00145853	853	0 		B00146007	7	1 		B00146463	463	0 		B00123522	522	0 		B00141878	878	0 		B00141983	983	0 		B00142503	503	0 		B00145367	367	0 		B00139975	975	0 		B00146051	51	0 		B00148146	146	0 		B00136765	765	0 		B00134210	210	0 		B00148959	959	0 		B00148834	834	0 		B00148572	572	0 		B00141067	67	0 		B00138640	640	0 		B00144863	863	0 		B00145876	876	0 		B00134039	39	0 		B00142956	956	0 		B00137073	73	0 		B00134969	969	0 		B00141010	10	1 		B00146688	688	0 		B00136187	187	0 		B00138882	882	0 		B00149112	112	0 		B00137282	282	0 		B00146654	654	0 		B00098373	373	0 		B00146176	176	0 		B00140024	24	0 		B00140972	972	0 		B00147339	339	0 		B00145312	312	0 		B00147010	10	1 		B00143095	95	0 		B00142610	610	0 		B00142198	198	0 		B00140430	430	0 		B00143754	754	0 		B00135841	841	0 		B00135311	311	0 		B00132946	946	0 		B00102852	852	0 		B00141676	676	0 		B00137770	770	0 		B00142018	18	0 		B00144255	255	0 		B00145502	502	0 		B00071838	838	0 		B00141111	111	0 		B00147030	30	0 &"Helvetica Neue,Regular"&12&K000000&P	   1. All submissions must be in the form of PDF documents. Spread- sheets exported to PDF will be accepted, but calculations must be annotated or explained. 2. It is up to you how you do the calculations in each question, but you must explain how you arrived at your answer for any given calculation. This can be done with a written explanation and by using the relevant equations, along with showing the results of intermediate stages of the calculations. In other words, you need to show that you know how to do a calculation for a statistic other than using spreadsheet functions. 3. Each one of the questions involves a statistical test. Marks within each question will generally be awarded for: 1 All calculations should use this data All calculations should use the datasets in the excel spreadsheet Graphs should be done in excel too • Deciding which statistical test to use, • Framing your Hypotheses and proper conclusions, • Identifying the parameters for the test and • Showing a reasonable level of clarity, detail and explanation in the calculations needed to carry out the test. 4. The data you have been given is in the worksheets of an Excel spreadsheet. This spreadsheet is locked against editing. Please to not try to circumvent this; if you wish to use a spreadsheet to do your calculations, you should copy and paste your data into your own spreadsheet and work with that. Question 1 The lifetimes (in units of 106 seconds) of certain satellite components are shown in the frequency distribution given in ‘Dataset1’. 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the data. 2. Calculate the frequency mean, the frequency standard deviation, the median and the first and third quartiles for this grouped data. 3. Compare the median and the mean and state what this indicates about the distribution. Comment on how the answer to this ques- tion relates to your frequency polygon and histogram. 4. Explain the logic behind the equations for the mean and standard deviation for grouped data, starting from the original equations for a simple list of data values. (This does not just mean ’explain how the equations are used’.) Page 2 5. Carry out an appropriate statistical test to determine whether the data is normally distributed. Question 2 A manufacturer of metal plates makes two claims concerning the thickness of the plates they produce. They are stated here: • Statement A: The mean is 200mm • Statement B: The variance is 1.5mm2. To investigate Statement A, the thickness of a sample of metal plates produced in a given shift was measured. The values found are listed in Part (a) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and sample standard deviation for the data in Part (a) of ’Dataset2’. Explain why we are using the phrase ’sample’ mean or sample’ standard deviation. 2. Set up the framework of an appropriate statistical test on State- ment A. Explain how knowing the sample mean before carrying out the test will influence the structure of your test. 3. Carry out the statistical test and state your conclusions. To investigate the second claim, the thickness of a second sample of metal sheets was measured. The values found are listed in Part (b) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and then the sample variance and standard deviation for the data in Part (b). Page 3 2. Set up the framework of an appropriate statistical test on State- ment B. Explain how knowing the sample variance before carry- ing out the test would influence the structure of your test. 3. Carry out the statistical test and state your conclusions. Question 3 A manager of an inter-county hurling team is concerned that his team lose matches because they ‘fade away’ in the last ten minutes. He has measured GPS data showing how much ground particular players cover within a given time period; this is the data in list (a) in worksheet ‘Dataset3’. He has acquired the corresponding data from an opposing, more successful team, which is given in list (b). 1. Calculate the sample mean and sample standard deviation for the two sets of data. 2. Set up the frame work of an appropriate statistical test to deter- mine whether there is a difference in the distances covered by the two groups of players. 3. Explain how having the results of the calculations above in ad- vance of doing your statistical test will influence the structure of that test. 4. Carry out the statistical test and state your conclusions. Question 4 A study was carried out to determine whether the resistance of the control circuits in a machine are lower when the machine motor is Page 4 running. To investigate this question, a set of the control circuits was tested as follows. Their resistance was measured while the machine motor was not running for a certain period of time and then again while the motor was running. The values found are listed in worksheet ‘Dataset4’, with kilo-Ohms as the unit of measurement. 1. Set up the structure of an appropriate statistical test to determine whether the resistance of the control circuit in a machine are lower when the machine motor is running. 2. Explain how the order of subtraction chosen to calculate the dif- ferences will influence the structure of the test. 3. Give a reason why the data is measured with the engine not run- ning first and then with the engine running. 4. Explain how knowing the mean of the differences in advance will influence the structure of your statistical test. 5. Carry out the statistical test and state your conclusions. Question 5 A study was carried out to determine the influence of a trace element found in soil on the yield of potato plants grown in that soil, defined as the weight of potatoes produced at the end of the season. A large field was divided up into 14 smaller sections for this experiment. For each section, the experimenter recorded the amount of the trace element found (in milligrams per metre squared) and the corresponding weight of the potatoes produced (in kilograms). This information is presented in the worksheet ‘Dataset5’ in the Excel document. Define X as the trace element amount and Y as the yield. Page 5 1. Draw a scatterplot of your data set. 2. Calculate the coefficients of a linear equation to predict the yield Y as a function of X. 3. Calculate the correlation coefficient for the paired data values. 4. Set up the framework for an appropriate statistical test to estab- lish if there is a correlation between the amount of the trace ele- ment and the yield. Explain how having the scatterplot referred to above and having the value of r in advance will influence the structure of your statistical test. 5. Carry out and state the conclusion of your test on the correlation. 6. Comment on how well the regression equation will perform based on the results above. Question 6 A multinational corporation is conducting a study to see how its em- ployees in five different countries respond to three gifts in an incentive scheme. The numbers of employees who choose each of the three gifts (G1 to G3) in each of the five countries (A to E) are given in the table in ‘Dataset6’ in the Excel document. 1. Set up the structure of an appropriate statistical test to deter- mine whether the data supports a link between choice of gift and country, including the statistic to be used. 2. Carry out this test, showing clearly in your work how the expected values are calculated for your test statistic. Page 6

Atul · Accepted Answer

Question 1 
Groups Frequencies 
300 to 305 6 
305 to 310 10 
310 to 315 35 
315 to 320 81 
320 to 325 82 
325 to 330 38 
330 to 335 14 
335 to 340 10
The lifetimes (in units of 106 seconds) of certain satellite components are shown in the 
frequency distribution given in ‘Dataset1’.
 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the 
data.
To draw the frequency polygon, we first need to calculate the midpoints of each group:
Intervals    Frequencies   Midpoint    Cumulative Frequency 
300-305         6                 302.5            6 
305-310         10               307.5           16 
310-315         35               312.5           51 
315-320         81               317.5           132 
320-325         82               322.5           214 
325-330         38               327.5           252 
330-335         14               332.5           266 
335-340         10               337.5           276 

Histogram
Finally, to draw the cumulative frequency polygon, we need to calculate the cumulative 
frequencies: 
 
Intervals Frequencies Midpoint 
Cumulative 
Frequency 
300-305 6 302.5 6 
305-310 10 307.5 16 
310-315 35 312.5 51 
315-320 81 317.5 132 
320-325 82 322.5 214 
325-330 38 327.5 252 
330-335 14 332.5 266 
335-340 10 337.5 276
To calculate the frequency mean, we need to first calculate the midpoint of each interval, then 
multiply each midpoint by its corresponding frequency, sum up the results, and finally divide 
by the total frequency. 
Intervals    Frequencies   Midpoint 
300-305         6           302.5 
305-310         10          307.5 
310-315         35          312.5 
315-320         81          317.5 
320-325         82          322.5 
325-330         38          327.5 
330-335         14          332.5 
335-340         10          337.5
Frequency Mean = (6*302.5 + 10*307.5 + 35*312.5 + 81*317.5 + 82*322.5 + 38*327.5 + 
14*332.5 + 10*337.5) / (6+10+35+81+82+38+14+10) = 320.7 
The frequency standard deviation can be calculated using the following formula: 
σ = sqrt[(Σ(f(x) - mean)^2) / n] 
where f(x) is the frequency of each interval, mean is the frequency mean we just calculated, 
and n is the total frequency. 
f(x)        midpoint        deviation        (deviation)^2      f(x)*(deviation)^2 
6           302.5           -18.2                         331.24           1987.44 
10          307.5           -13.2                         174.24           1742.4 
35          312.5           -8.2                          67.24            2353.4 
81          317.5           -2.2                          4.84             392.04 
82          322.5           2.8                           7.84             642.88 
38          327.5           7.8                           60.84            2312.92 
14          332.5           12.8                          163.84           2293.76 
10          337.5           17.8                          316.84           3168.4 
σ = sqrt[(Σ(f(x) - mean)^2) / n] = sqrt[ (1987.44 + 1742.4 + 2353.4 + 392.04 + 642.88 + 
2312.92 + 2293.76 + 3168.4) / 336] ≈ 8.05 
To find the median, we need to find the interval that contains the 168th value (the halfway 
point between the 336 frequencies). The cumulative frequency column tells us that the 168th 
value falls within the 320-325 interval, which has a cumulative frequency of 132. The 
interval width is 325-320 = 5, and we need to find how much of this interval contains the 
168th value. To do so, we calculate: 
p = (168 - 132) / 82 = 0.439 
Median = lower limit of the interval + (p * interval width) = 320 + (0.439 * 5) = 322.195 
quartile = lower limit of the interval + (p * interval width) 
where p is the fractional part of (n * quartile number) / 4 and n is the total frequency.
For the first quartile (Q1), we need to find the interval that contains the 84th value (which is 
(336 * 1) / 4). The cumulative frequency column tells us that the 84th value falls within the 
310-315 interval, which has a cumulative frequency of 16 + 35 = 51. The interval width is 
315-310 = 5, and we need to find how much of this interval contains the 84th value. To do so, 
we calculate: 
p = (84 - 51) / 81 = 0.407 
Q1 = lower limit of the interval + (p * interval width) = 310 + (0.407 * 5) = 312.035 
For the third quartile (Q3), we need to find the interval that contains the 252nd value (which 
is (336 * 3) / 4). The cumulative frequency column tells us that the 252nd value falls within 
the 325-330 interval, which has a cumulative frequency of 132 + 82 + 38 = 252. The interval 
width is 330-325 = 5, and we need to find how much of this interval contains the 252nd 
value. To do so, we calculate: 
p = (252 - 132 - 82) / 38 = 0.842 
Q3 = lower limit of the interval + (p * interval width) = 325 + (0.842 * 5) = 329.21 
Therefore, the first quartile (Q1) is approximately 312.035 and the third quartile (Q3) is 
approximately 329.21.
3. Compare the median and the mean and state what this indicates about the 
distribution. Comment on how the answer to this question relates to your frequency 
polygon and histogram. 
The median for this grouped data is approximately 321.875, and the mean is approximately 
322.195.
Since the mean and the median are relatively close in value, this suggests that the data is 
fairly symmetrically distributed. This is also evident from the frequency polygon and 
histogram, where we see that the distribution is somewhat bell-shaped, with the highest 
frequencies occurring in the middle of the data range and decreasing as we move towards the 
extremes.
However, there is a slight right skew in the distribution, as we can see from the frequency 
polygon and histogram where the right tail extends further than the left tail. This skewness is 
also reflected in the fact that the mean is slightly larger than the median, indicating that the 
right tail of the distribution is pulling the mean towards it.
Overall, we can conclude that the distribution is roughly symmetric but slightly skewed to the 
right.
4. Explain the logic behind the equations for the mean and standard deviation for 
grouped data, starting from the original equations for a simple list of data values. (This 
does not just mean ’explain how the equations are used’.)
The equations for the mean and standard deviation for grouped data are modifications of the 
equations for the mean and standard deviation for a simple list of data values. The main 
difference is that the grouped data is divided into intervals, and the frequency of each interval 
is used to determine the weight of each interval in the calculation of the mean and standard 
deviation.
For the mean, the equation for grouped data is:
mean = Σ (midpoint * frequency) / Σ frequency 
where midpoint is the midpoint of each interval, and frequency is the frequency of each 
interval. The numerator represents the sum of the products of the midpoint and frequency of 
each interval, while the denominator represents the total frequency of all intervals. This 
equation is used to calculate the weighted average of the midpoints of the intervals, where the 
weight of each interval is its frequency.
For the standard deviation, the equation for grouped data is:
standard deviation = sqrt(Σ [(x - mean)^2 * frequency] / (Σ frequency - 1)) 
where x is the midpoint of each interval, mean is the mean of the data set, and frequency is 
the frequency of each interval. The numerator represents the sum of the products of the 
squared differences between the midpoint and the mean and the frequency of each interval,

Statistics and Probability

Answer To: Statistics and Probability

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3