STATS6900
Quantitative Methods for Business
Assignment –
Semester 1, 2013Date Due: Refer to Course DescriptionTotal Marks: 45Worth: 20% of final assessmentThis assignment requires a considerable amount of computer work and written comment. You may need to seek guidance from your tutor along the way.
Do not leave things until too late. Each question carefully describes what you are required to do, so please follow the instructions carefully. Your answer to each question should begin with the number of the question.
In this assignment you will examine statistical data consisting of records of
car accidents on Victorian roads and freeways in 2012.
The Victorian department of transportation, Police and VIC Roads are interested in determining the
main factors affecting car accidents in Victoria.
They will use this information to develop strategies aimed at reducing the number of fatalities on Victorian roads.
Alcohol is a major factor in road deaths in Victoria. Each year, about one quarter of drivers killed in road crashes had a blood alcohol concentration (BAC) of .05 or greater.
The Victorian road rules relating to the use of alcohol & drugs by drivers may be briefly summarized as follows:
Blood alcohol concentrationProfessional drivers such as truck, bus or taxi drivers must have a blood alcohol concentration (BAC) under 0.02 ; P-platers and learner drivers, as well as visitors from interstate or overseas, must have a zero blood alcohol concentration (BAC). All other drivers must stay under .05 BAC. You must be under .05 BAC while supervising a learner driver. This rule applies to public roads and also private property.
Drug impairment
It is illegal to drive while impaired by any drug whether that drug is legal or illegal. You must not be affected by illegal drugs while accompanying a learner driver. This rule applies to public roads and also private property. (VIC Roads, 2012).
Data was collected and is contained in a file called
‘Car Accident Data.xls’
and the columns of the file contain the following information:
Column
|
Name
|
Description
|
A
|
Driver ID Number
|
Number to identify each driver
|
B
|
Driver Gender
|
1= Male 2= Female
|
C
|
Driver Age
|
Driver Age (Years)
|
D
|
No of Fatalities
|
No of people killed in the accident
|
E
|
Cause of the accident
|
1= Alcohol & Drug 2= Speed 3= Fatigue 4= Safety Belt 5= Mobile Phone |
F
|
Car Speed
|
km/hr
|
F
|
Blood Alcohol Level
|
Percentage of alcohol (ethanol) in the blood
|
G
|
Licence Type
|
1= Full 2- P Plate |
H
|
Location of the Accident
|
1= Freeways 2= City Roads
|
When you do your sample you must use
your random sample of 130 records
from the
345 provided in the file
Car Accident Data.xls
available on Ubonline. If you check your email I have already made a sample for each student in the file “
random sample find your name.xls
”. Use the Random Sample Generator, available on Moodle in the Lab Bundle, to do this. Your answers to the assignment tasks below are to be based
on
your
sample of 130 records. Make sure you keep a safe copy of your sample, since you cannot use the Random Sample Generator to reproduce the first sample.
Assignment TasksFor each task below, you must answer all the questions in sequential order and submit all of the required printouts, graphs, tables and summaries required.
NB:Each graph and table should have a heading and each axis should have a label!!
Introduction and Variable List: Give a brief introduction to your report. Describe the nature of the data. Read questions 3 to 8 and briefly describe the specific data and relationships which will be examined here.
[1 mark]
Data:
Provide a printout of the data in your sample, sorted in ascending order based on Driver ID number.
[1 mark]
- Produce a histogram showing the age distribution of the drivers involved in car accidents. Provide your comments on the graph shape, and the most suitable measures of centre and spread for this data.
[3 mark]
Pivot Tables:
There is a general feeling that male drivers commit more driving offences causing more accidents and more fatalities compared to female drivers. For each of the following prepare a contingency table (cross-classification table), using the Pivot Table option in Excel. Make sure to comment on similarities or differences and any conclusions you can draw in each case.
The average
car speed in an accident for the cause of accident split on gender
.
The average
driver blood alcohol level in a road accident for each licence type split on gender
.
The number of
road accident fatalities
for each licence type
split on gender.
The number of
road accidents for each licence type
split on gender.
[8 marks]
T-Test and CI:
The Victorian police believe that the average speed of vehicles involved in accidents on Victorian highways is significantly higher than 100 km/hr.
- Obtain appropriate descriptive statistics and so calculate a 95% confidence interval for the mean
car speed in accidents on
Victorian highways.
- Conduct a statistical test to determine if the average car speed in accidents on Victorian highways is significantly higher than 100 km/hr.
Mention any assumptions, include relevant hypotheses and report the results and conclusion in the conventional manner.
[6 marks]
Stratified scatter plot with trend lines:
Obtain a stratified scatterplot comparing the relationship between
driver age
and
car speed (km/hr) in accidents
for each gender.
Think carefully about which variable should go on the vertical axis. Remember, it is the independent variable that goes on the horizontal axis (i.e. the x-axis). Include trend lines, their equations and R-squared values on the graph. Make sure you label axes properly and that your graph has an appropriate title.
Briefly compare the nature of the relationship between these two variables, the driver Age (years)
and Speed
(km/hr),
for males and females(Hint: mention any differences in the slope of the trend line and/or the R-squared values and what this indicates).
[6 marks]
- Create a table of correlation coefficients for the following variables: Age, Number of Fatalities, Car Speed and Blood Alcohol Level. Discuss your results and state which pairs of variables have the highest and the lowest correlation coefficients.
[4 marks]
Use Excel to carry out a regression analysis on the two variables:
driver age (years)
(independent variable) and
car speed (km/hr)
(dependent variable).
- Copy the output into your assignment and use it to determine the answers to the following questions.
Write down the regression equation.
State the R-squared value and the standard error and explain what they mean with respect to the data.
]
Write down the value of the gradient of the regression line and explain what it means for this data.
Are the values for the constant and the gradient (slope) significant (i.e. significantly different from zero) in this case? Justify your answer.
Do you think this regression model is a good model?
Justify your answer using the regression output.
[13 marks]
- Using the information obtained for your analyses
write a short conclusion about what you found from the study above.
[3 marks]