STATS1900 Business StatisticsMajor AssignmentDate Due: Refer to Course DescriptionTotal Marks: 40Worth: 20% of final assessmentThis assignment requires a considerable amount of computer work and written comment. You may need to seek guidance from your tutor along the way. Do not leave things until too late. Each question carefully describes what you are required to do, so please follow these carefully.
In this assignment, you will again examine data from a fruit and vegetable market that supplies fresh produce to supermarket outlets in NSW and Victoria. The market is conducting an evaluation of customers in Sydney and Melbourne. The fruit and vegetable market wants to examine the % of produce damage at each of the outlets and is particularly interested in the relationship between % produce damage with: the type of produce; the customer; and, quantity (sold per day).
The data collected is contained in a file called ‘
Fruit and Veggie Market.xls’
and the columns of the file contain the following information:
Column
|
Name
|
Description
|
A
|
ID Number
|
Customer Number
|
B
|
Customer
|
1= Coles 2= Private 3= Safeway
|
C
|
City
|
1= Sydney 2= Melbourne
|
D
|
Produce
|
1= Apple 2= Orange 3= Potato 4= Tomato |
E
|
Quantity (sold/day)
|
Total quantity of produce sold per day
|
F
|
Price$/kg
|
Price in dollars per kilogram
|
G
|
% of produce damage
|
Percentage of produce damaged
|
Use your random sample of 200 customers from the minor assignment. Your answers to the assignment tasks below are to be based on
yoursample of 200 customers. Make sure you keep a safe copy of your sample, since you cannot use Random Sample Generator to reproduce the first sample. (If necessary, talk to your tutor about generating a new sample.)
Assignment TasksFor each task below, you must submit all of the required printouts, graphs, tables and summaries required.
NB:Each graph and table should have a heading and each axis should have a label!!
Introduction and Variable List:
Give a brief introduction to your report, setting out the nature of the data.
[1 mark]
Data: Provide a printout of the data in your sample, sorted in ascending order based on customer number.
[1 mark]
Pivot Table:
Prepare a contingency table (cross-classification) using the Pivot Table option in Excel that compares types of
produce
and
customers. You may use counts or percentages as the basis of comparison. Compare any similarities or differences in the types of
produce
sold, between Coles and Safeway.
[2 marks]
Pivot Table:
Prepare another contingency table that shows the
average (mean) price$/kg
and
average (mean) % of produce damaged, classified by
customer
and the type of
produce. Compare any similarities or differences among the
customers
with regard to prices, and damaged apples, oranges and tomatoes:
a) Which
customer
has the highest
average (mean) price$/kgfor apples, and which for oranges?
- Which
customer
has the highest
mean % of produce damaged
for tomatoes, and which for apples?
[4 marks]
T-Test:
The market wants to investigate the
% of produce damaged
at Coles’ stores in Sydney and Melbourne so they can make a comparison with other states. The current data will be used as a typical representation of the
% of produce damaged.
- Determine the mean, standard deviation and standard error for the
% of produce damaged
at Coles’ stores in your sample.
[2 marks]
- The market has found that, on average, there is no more than 5% produce damage at Coles’ stores in other states. Based on this information carry out a t-test to see if the average
% of produce damaged
at Coles stores in Sydney and Melbourne is less than 5%. Mention any assumptions, include relevant hypotheses and report the result and conclusion in the conventional manner.
[6 marks]
- Obtain a scatterplot of the
% of produce damaged
and
price$/kg
for all
customers. Think carefully about which variable should go on the vertical axis. Remember, it is the independent variable that goes on the horizontal axis (i.e. the x-axis). Make sure you label your axes properly and that your graph has an appropriate title.
[3 marks]
- Briefly describe the nature of the relationship between these two variables (Hint: mention strength and direction).
[2 marks]
- Use Excel to carry out a regression analysis on two variables:
% of produce damaged
(independent variable) and
price$/kg
(dependent variable).
- Copy the output into your assignment and then use it to determine the answer to the following questions.
[2 mark]
- Write down the regression equation.
[1 mark]
- State the R-squared value and the standard error and explain what they mean with respect to the data.
[4 marks]
- Write down the value of the gradient of the regression line and explain what it means in the context of this question.
[3 marks]
- Are the values for the constant and intercept (slope coefficient) significant in this case? Justify your answer using values from the output.
[3marks]
- Do you think this regression model is a good model? Justify your answer using the regression output.
[3 marks]
- Using the information obtained for your analyses write a short conclusion about what you found out about the relationship between the
% of produce damaged
and
price$/kg, and how this relates to
customer
and the type of
produce.
[3 marks]