Business Data Analysis MIS771 - Descriptive Analytics and Visualisation Trimester 1, 2019 Page 1 of 10 MIS771 Descriptive Analytics and Visualisation Assignment Two Background This is an individual...

1 answer below »

View more »
Answered Same DayMay 02, 2021MIS771Deakin University

Answer To: Business Data Analysis MIS771 - Descriptive Analytics and Visualisation Trimester 1, 2019 Page 1 of...

Pooja answered on May 05 2021
147 Votes
Table of Contents
Introduction    2
Analysis    3
Task 1    3
Task 2    3
Task 2.1    3
Task 2.2    4
Task 2.3    6
Task 3    7
Task 3.1    7
Task 3.2    8
Task 3.3    10
Task 4    10
Conclusion    12
References    14
Introduction
The dependent variable is quantity ordered and recommendation. The independent variables are Loyalty, Customer Type, Region, Distribution Channel, Quality, SM Presence, Advert, and Brand Image, comp Pricing, Order Fulfilment, Flex Price, Shipping Speed, Shipping Cost, Order Quantity, and Recommend. The technique of descriptive statistics is used to analyse the order quantity and recommendation. The technique of correlation analysis can be helpful to identify the factors which affect the order quantity. A regression equation is constructed to predict the quantity ordered on the basis of identified factors. A logistic regression equation is created to predict the probability of recommendation.
The dependent variable Pale Ale production (litres) is predicted for 2nd quarter 2019, 3rd quarter 2019, 4th quarter 2019, and 1st quarter 2020. The technique of regression analysis is used for prediction.
Analysis
Task 1
The average number of bottles ordered by the customer is 7665 with a low standard deviation of 0.89 units. The distribution of bottles order by the customer is approximately normally distributed as skewness is equal to -0.2.
    Order_Qty
    
    
    Mean
    7.665
    Standard Error
    0.063161
    Median
    7.6
    Mode
    7.2
    Standard Deviation
    0.893233
    Sample Variance
    0.797864
    Kurtosis
    0.584038
    Skewness
    -0.20635
    Range
    5.6
    Minimum
    4.3
    Maximum
    9.9
    Sum
    1533
    Count
    200
    Row Labels
    Count of Recommend
    0
    99
    1
    101
    Grand T
otal
    200
There is not much difference in the proportion of customers who recommend Mad Dog Craft Beer (as a supplier) to others. There are 101 customers who would recommend Mad Dog Craft Beer (as a supplier) to others from a sample of 200 customers.
Task 2
Task 2.1
The technique of correlation analysis can be helpful to identify the factors which affect the order quantity. The correlation matrix is given below.
     
    Loyalty
    Quality
    SM_Presence
    Advert
    Brand_Image
    Comp_Pricing
    Order_Fulfillment
    Flex_Price
    Shipping_Speed
    Shipping_Cost
    Order_Qty
    Loyalty
    1
    
    
    
    
    
    
    
    
    
    
    Quality
    0.084
    1.000
    
    
    
    
    
    
    
    
    
    SM_Presence
    0.190
    -0.034
    1.000
    
    
    
    
    
    
    
    
    Advert
    0.259
    -0.054
    0.505
    1.000
    
    
    
    
    
    
    
    Brand_Image
    0.258
    -0.116
    0.788
    0.627
    1.000
    
    
    
    
    
    
    Comp_Pricing
    0.076
    -0.448
    0.177
    0.099
    0.200
    1.000
    
    
    
    
    
    Order_Fulfillment
    0.139
    0.083
    0.217
    0.230
    0.284
    -0.060
    1.000
    
    
    
    
    Flex_Price
    0.058
    -0.487
    0.186
    0.260
    0.272
    0.470
    0.419
    1.000
    
    
    
    Shipping_Speed
    0.196
    0.067
    0.241
    0.323
    0.299
    -0.055
    0.773
    0.513
    1.000
    
    
    Shipping_Cost
    0.175
    0.141
    0.215
    0.247
    0.296
    -0.094
    0.696
    0.358
    0.840
    1.000
    
    Order_Qty
    0.405
    0.433
    0.235
    0.237
    0.338
    -0.218
    0.315
    -0.003
    0.425
    0.504
    1
Consider the cut-off value as |r|>0.4. I expect a moderate positive linear relationship for order quantity with loyalty, quality, shipping speed, and shipping cost. Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016. 
The scatterplot for order quantity with each independent variable is given below.
I include variables loyalty, quality, shipping speed, and shipping cost as they have a moderate linear relationship with order quantity. This is evident from the scatterplots above. Scatterplot for these variables shown an upward trend with points moderately close to each other.
But there is problem of multi-co-linearity as shipping speed, and shipping cost have a strong positive linear relationship between them. I decide to include shipping cost for the regression analysis as it has stronger linear relationship with order quantity in comparison to shipping speed.
Task 2.2
The regression equation for predicting the order quantity on the basis of Quality, Shipping Cost, and Loyalty is: order quantity = 3.7039 + 0.22676 * quality + 0.3019*shipping cost + 0.06562 * loyalty
The coefficient of determination is 0.478. There is 47.8% variation in order quantity which is explained by Quality, Shipping Cost, and Loyalty. This model is not a good fit for the data as the percentage is less than 70%.
Null hypothesis, model is not significant. Alternative hypothesis, model is significant. With F=59.95, p<5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to conclude that the model is significant. Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016. 
The null hypothesis, the coefficient of Xi (independent variable) is not significant, beta_i = 0. Versus the alternative hypothesis, the coefficient of Xi (independent variable) is significant, beta_i =/= 0. With p-value < 5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to prove that the position of independent variables quality, shipping cost, and loyalty are significant at 5% level of significance.
I am 95% confident that estimated value of coefficient for quality, shipping cost, and loyalty lie in the interval (0.160279, 0.29325), (0.22359, 0.38), and (0.04339, 0.0878) respectively.
Model diagnosis
The assumption of normality is satisfied as the PP plot is S shaped. There is equality of error variances as points are randomly distributed in the residual plot. There is no problem of multi-co-linearity in the data as value of |r| is less than the cut off 0.8.
Task 2.3
The interaction term is created for quality and brand image. The dependent variable is the order quantity. The independent variables considered are quality, brand image, and the interaction between quality and brand image.
The regression equation is given by order quantity = 0.50108 + 0.6911* quality + 0.8643* brand image - 0.0685*Quality*Brand_image
The coefficient of determination is 35%. There is 35% variation in order quantity which is explained by quality, brand image, and the interaction between quality and brand image. This is a bad fit for the data.
The null hypothesis, the coefficient of Xi (independent variable) is not significant, beta_i = 0. Versus the alternative hypothesis, the coefficient of Xi (independent variable) is significant, beta_i =/= 0. With p-value < 5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to prove that the position of independent variables quality, brand image, and the interaction between quality and brand image are significant at 5% level of significance.
The two lines in the interaction plot intersect each other. Hence I can say that interaction between quality and brand image are significant.
With 1 unit increase in quality, the order quantity of beers is increased by 501 bottles. With 1 unit increase in brand image, the order quantity of beers is increased by 691 bottles. When the brand image is increased by 1 level of positivity along with the increase in quality by one Level, the quantity order is decreased by 68 bottles.
Task 3
Task 3.1
The dependent variable is recommended. The independent variables are quality, brand image, shipping speed, and distance travel. I want to predict the probability of recommendation on the basis of quality, brand image, shipping speed, and distance travel.
The method of GRG nonlinear solver is used to obtain the parameters of intercept and 4 coefficients of the independent variable. The value of 14 parameters and maximizing the likelihood is given below.
    b0
    b1
    b2
    b3
    b4
    L
    -13.278
    0.654
    0.621
    1.159
    0.968
    -95.377
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel))
With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative. Fox, J., 2015. 
Task 3.2
For the possible values of quality ranging from 1 to 10, brand image levels as 1, 5, 10, shipping speed as neutral with value of 5, and 2 types of distance channel (0 = Through a Sales Representative; 1 = Directly) the predicted probability of recommendation is given below.
The prediction for recommendation is calculated with the help of logistic regression equation: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel)).
The various considered values of independent variables quality, brand image, shipping speed, and distance channel and their corresponding predicted probability of recommendation is given below.
    Quality
    Brand_Image
    Shipping_Speed
    Dist_Channel
    Recommend_predicted_probability
    1
    1
    5
    1
    0.005272
    2
    1
    5
    1
    0.010091
    3
    1
    5
    1
    0.019227
    4
    1
    5
    1
    0.036334
    5
    1
    5
    1
    0.067610
    6
    1
    5
    1
    0.122389
    7
    1
    5
    1
    0.211485
    8
    1
    5
    1
    0.340290
    9
    1
    5
    1
    0.498000
    10
    1
    5
    1
    0.656109
    1
    5
    5
    1
    0.059749
    2
    5
    5
    1
    0.108903
    3
    5
    5
    1
    0.190310
    4
    5
    5
    1
    0.311310
    5
    5
    5
    1
    0.465057
    6
    5
    5
    1
    0.625744
    7
    5
    5
    1
    0.762783
    8
    5
    5
    1
    0.860806
    9
    5
    5
    1
    0.922442
    10
    5
    5
    1
    0.958113
    1
    10
    5
    1
    0.586375
    2
    10
    5
    1
    0.731648
    3
    10
    5
    1
    0.839835
    4
    10
    5
    1
    0.909784
    5
    10
    5
    1
    0.950968
    6
    10
    5
    1
    0.973891
    7
    10
    5
    1
    0.986252
    8
    10
    5
    1
    0.992804
    9
    10
    5
    1
    0.996245
    10
    10
    5
    1
    0.998044
    1
    1
    5
    0
    0.002009
    2
    1
    5
    0
    0.003857
    3
    1
    5
    0
    0.007392
    4
    1
    5
    0
    0.014119
    5
    1
    5
    0
    0.026805
    6
    1
    5
    0
    0.050307
    7
    1
    5
    0
    0.092457
    8
    1
    5
    0
    0.163830
    9
    1
    5
    0
    0.273686
    10
    1
    5
    0
    0.420188
    1
    5
    5
    0
    0.023568
    2
    5
    5
    0
    0.044362
    3
    5
    5
    0
    0.081961
    4
    5
    5
    0
    0.146540
    5
    5
    5
    0
    0.248244
    6
    5
    5
    0
    0.388410
    7
    5
    5
    0
    0.549834
    8
    5
    5
    0
    0.701406
    9
    5
    5
    0
    0.818765
    10
    5
    5
    0
    0.896785
    1
    10
    5
    0
    0.350009
    2
    10
    5
    0
    0.508749
    3
    10
    5
    0
    0.665744
    4
    10
    5
    0
    0.792983
    5
    10
    5
    0
    0.880482
    6
    10
    5
    0
    0.934073
    7
    10
    5
    0
    0.964600
    8
    10
    5
    0
    0.981275
    9
    10
    5
    0
    0.990175
    10
    10
    5
    0
    0.994867
Task 3.3
The plot of probabilities predicted for the available data is given below.
The probability lie in the range from 0 to 1.
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel))
With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative. Chatterjee, S. and Hadi, A.S., 2015. 
Task 4
The regression output for predicting Pale Ale production (litres) on the basis of time (t) corresponding value of 1 with 3rd quarter of 2009.
The regression equation is given by Pale Ale production (litres) = 1111.39 + 16.39*t
For the given time the predicted value of Pale Ale production (litres) is summarized in the table below.
    year
    quarter
    t
    predicted pale ale (litres)
    2019
    Q2
    44
    1832.597747
    2019
    Q3
    45
    1848.988824
    2019
    Q4
    46
    1865.379901
    2020
    Q1
    47
    1881.770977
Conclusion
The average number of bottles ordered by the customer is 7665 with a low standard deviation of 0.89 units. There is not much difference in the proportion of customers who recommend Mad Dog Craft Beer (as a supplier) to others.
I expect a moderate positive linear relationship for order quantity with loyalty, quality, shipping speed, and shipping cost.
The regression equation is given by bye order quantity = 4.3091+ 0.1754892* quality -0.0727shipping speed + 0.28* shipping cost + 0.061* loyalty + 0.438* recommended. There is sufficient evidence to conclude that the model is significant. There is sufficient evidence to prove that the position of independent variables quality, shipping speed for my shipping cost from Novelty, and peppermint are significant at 5% level of significance.
The regression equation is given by order quantity = 0.50108 + 0.6911* quality + 0.8643* brand image - 0.0685*Quality*Brand_image. When the brand image is increased by 1 level of positivity along with the increase in quality by one Level, the quantity order is decreased by 68 bottles.
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel)). With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative.
The regression equation is given by Pale Ale production (litres) = 1111.39 + 16.39*t. the predicted Pale Ale production for 2nd quarter 2019, 3rd quarter 2019, 4th quarter 2019, and 1st quarter 2020 is 1832.59, 1848.98, 1865.37, and 1881.77 respectively.
References
Chatterjee, S. and Hadi, A.S., 2015. Regression analysis by example. John Wiley & Sons.
Draper, N.R. and Smith, H., 2014. Applied regression analysis(Vol. 326). John Wiley & Sons.
Fox, J., 2015. Applied regression analysis and generalized linear models. Sage Publications.
Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016. Understanding regression analysis: An introductory guide (Vol. 57). Sage Publications
Appendix
Task 2.2
    SUMMARY OUTPUT
    
    
    Regression Statistics
    Multiple R
    0.722861
    R Square
    0.522528
    Adjusted R Square
    0.510222
    Standard Error
    0.625121
    Observations
    200
    ANOVA
    
    
    
    
    
     
    df
    SS
    MS
    F
    Significance F
    Regression
    5
    82.96445
    16.59289
    42.46138
    2.02E-29
    Residual
    194
    75.81055
    0.390776
    
    
    Total
    199
    158.775
     
     
     
    
    Coefficients
    Standard Error
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Lower 95.0%
    Upper 95.0%
    Intercept
    4.309159
    0.364355
    11.82683
    1.18E-24
    3.590555
    5.027764
    3.590555
    5.027764
    Quality
    0.175489
    0.03483
    5.038511
    1.07E-06
    0.106796
    0.244183
    0.106796
    0.244183
    Shipping_Speed
    -0.07279
    0.112097
    -0.64933
    0.516896
    -0.29387
    0.148298
    -0.29387
    0.148298
    Shipping_Cost
    0.281479
    0.069614
    4.043402
    7.6E-05
    0.144181
    0.418778
    0.144181
    0.418778
    Loyalty
    0.061506
    0.010935
    5.624941
    6.41E-08
    0.03994
    0.083072
    0.03994
    0.083072
    Recommend
    0.438802
    0.103864
    4.224798
    3.68E-05
    0.233956
    0.643649
    0.233956
    0.643649
Task 2.3
    SUMMARY OUTPUT
    
    
    Regression Statistics
    Multiple R
    0.595757
    R Square
    0.354926
    Adjusted R Square
    0.345052
    Standard Error
    0.722883
    Observations
    200
    ANOVA
    
    
    
    
    
     
    df
    SS
    MS
    F
    Significance F
    Regression
    3
    56.35338
    18.78446
    35.94704
    1.48E-18
    Residual
    196
    102.4216
    0.522559
    
    
    Total
    199
    158.775
     
     
     
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Lower 95.0%
    Upper 95.0%
    Intercept
    0.501087
    1.53677
    0.326065
    0.744723
    -2.52964
    3.531814
    -2.52964
    3.531814
    Quality
    0.691114
    0.187122
    3.69338
    0.000287
    0.322082
    1.060146
    0.322082
    1.060146
    Brand_Image
    0.864327
    0.269459
    3.207644
    0.001563
    0.332917
    1.395737
    0.332917
    1.395737
    Quality*Brand_image
    -0.06856
    0.032934
    -2.08163
    0.038675
    -0.1335
    -0.00361
    -0.1335
    -0.00361
Task...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30