Answer To: Business Data Analysis MIS771 - Descriptive Analytics and Visualisation Trimester 1, 2019 Page 1 of...
Pooja answered on May 05 2021
Table of Contents
Introduction 2
Analysis 3
Task 1 3
Task 2 3
Task 2.1 3
Task 2.2 4
Task 2.3 6
Task 3 7
Task 3.1 7
Task 3.2 8
Task 3.3 10
Task 4 10
Conclusion 12
References 14
Introduction
The dependent variable is quantity ordered and recommendation. The independent variables are Loyalty, Customer Type, Region, Distribution Channel, Quality, SM Presence, Advert, and Brand Image, comp Pricing, Order Fulfilment, Flex Price, Shipping Speed, Shipping Cost, Order Quantity, and Recommend. The technique of descriptive statistics is used to analyse the order quantity and recommendation. The technique of correlation analysis can be helpful to identify the factors which affect the order quantity. A regression equation is constructed to predict the quantity ordered on the basis of identified factors. A logistic regression equation is created to predict the probability of recommendation.
The dependent variable Pale Ale production (litres) is predicted for 2nd quarter 2019, 3rd quarter 2019, 4th quarter 2019, and 1st quarter 2020. The technique of regression analysis is used for prediction.
Analysis
Task 1
The average number of bottles ordered by the customer is 7665 with a low standard deviation of 0.89 units. The distribution of bottles order by the customer is approximately normally distributed as skewness is equal to -0.2.
Order_Qty
Mean
7.665
Standard Error
0.063161
Median
7.6
Mode
7.2
Standard Deviation
0.893233
Sample Variance
0.797864
Kurtosis
0.584038
Skewness
-0.20635
Range
5.6
Minimum
4.3
Maximum
9.9
Sum
1533
Count
200
Row Labels
Count of Recommend
0
99
1
101
Grand Total
200
There is not much difference in the proportion of customers who recommend Mad Dog Craft Beer (as a supplier) to others. There are 101 customers who would recommend Mad Dog Craft Beer (as a supplier) to others from a sample of 200 customers.
Task 2
Task 2.1
The technique of correlation analysis can be helpful to identify the factors which affect the order quantity. The correlation matrix is given below.
Loyalty
Quality
SM_Presence
Advert
Brand_Image
Comp_Pricing
Order_Fulfillment
Flex_Price
Shipping_Speed
Shipping_Cost
Order_Qty
Loyalty
1
Quality
0.084
1.000
SM_Presence
0.190
-0.034
1.000
Advert
0.259
-0.054
0.505
1.000
Brand_Image
0.258
-0.116
0.788
0.627
1.000
Comp_Pricing
0.076
-0.448
0.177
0.099
0.200
1.000
Order_Fulfillment
0.139
0.083
0.217
0.230
0.284
-0.060
1.000
Flex_Price
0.058
-0.487
0.186
0.260
0.272
0.470
0.419
1.000
Shipping_Speed
0.196
0.067
0.241
0.323
0.299
-0.055
0.773
0.513
1.000
Shipping_Cost
0.175
0.141
0.215
0.247
0.296
-0.094
0.696
0.358
0.840
1.000
Order_Qty
0.405
0.433
0.235
0.237
0.338
-0.218
0.315
-0.003
0.425
0.504
1
Consider the cut-off value as |r|>0.4. I expect a moderate positive linear relationship for order quantity with loyalty, quality, shipping speed, and shipping cost. Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016.
The scatterplot for order quantity with each independent variable is given below.
I include variables loyalty, quality, shipping speed, and shipping cost as they have a moderate linear relationship with order quantity. This is evident from the scatterplots above. Scatterplot for these variables shown an upward trend with points moderately close to each other.
But there is problem of multi-co-linearity as shipping speed, and shipping cost have a strong positive linear relationship between them. I decide to include shipping cost for the regression analysis as it has stronger linear relationship with order quantity in comparison to shipping speed.
Task 2.2
The regression equation for predicting the order quantity on the basis of Quality, Shipping Cost, and Loyalty is: order quantity = 3.7039 + 0.22676 * quality + 0.3019*shipping cost + 0.06562 * loyalty
The coefficient of determination is 0.478. There is 47.8% variation in order quantity which is explained by Quality, Shipping Cost, and Loyalty. This model is not a good fit for the data as the percentage is less than 70%.
Null hypothesis, model is not significant. Alternative hypothesis, model is significant. With F=59.95, p<5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to conclude that the model is significant. Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016.
The null hypothesis, the coefficient of Xi (independent variable) is not significant, beta_i = 0. Versus the alternative hypothesis, the coefficient of Xi (independent variable) is significant, beta_i =/= 0. With p-value < 5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to prove that the position of independent variables quality, shipping cost, and loyalty are significant at 5% level of significance.
I am 95% confident that estimated value of coefficient for quality, shipping cost, and loyalty lie in the interval (0.160279, 0.29325), (0.22359, 0.38), and (0.04339, 0.0878) respectively.
Model diagnosis
The assumption of normality is satisfied as the PP plot is S shaped. There is equality of error variances as points are randomly distributed in the residual plot. There is no problem of multi-co-linearity in the data as value of |r| is less than the cut off 0.8.
Task 2.3
The interaction term is created for quality and brand image. The dependent variable is the order quantity. The independent variables considered are quality, brand image, and the interaction between quality and brand image.
The regression equation is given by order quantity = 0.50108 + 0.6911* quality + 0.8643* brand image - 0.0685*Quality*Brand_image
The coefficient of determination is 35%. There is 35% variation in order quantity which is explained by quality, brand image, and the interaction between quality and brand image. This is a bad fit for the data.
The null hypothesis, the coefficient of Xi (independent variable) is not significant, beta_i = 0. Versus the alternative hypothesis, the coefficient of Xi (independent variable) is significant, beta_i =/= 0. With p-value < 5%, the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to prove that the position of independent variables quality, brand image, and the interaction between quality and brand image are significant at 5% level of significance.
The two lines in the interaction plot intersect each other. Hence I can say that interaction between quality and brand image are significant.
With 1 unit increase in quality, the order quantity of beers is increased by 501 bottles. With 1 unit increase in brand image, the order quantity of beers is increased by 691 bottles. When the brand image is increased by 1 level of positivity along with the increase in quality by one Level, the quantity order is decreased by 68 bottles.
Task 3
Task 3.1
The dependent variable is recommended. The independent variables are quality, brand image, shipping speed, and distance travel. I want to predict the probability of recommendation on the basis of quality, brand image, shipping speed, and distance travel.
The method of GRG nonlinear solver is used to obtain the parameters of intercept and 4 coefficients of the independent variable. The value of 14 parameters and maximizing the likelihood is given below.
b0
b1
b2
b3
b4
L
-13.278
0.654
0.621
1.159
0.968
-95.377
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel))
With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative. Fox, J., 2015.
Task 3.2
For the possible values of quality ranging from 1 to 10, brand image levels as 1, 5, 10, shipping speed as neutral with value of 5, and 2 types of distance channel (0 = Through a Sales Representative; 1 = Directly) the predicted probability of recommendation is given below.
The prediction for recommendation is calculated with the help of logistic regression equation: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel)).
The various considered values of independent variables quality, brand image, shipping speed, and distance channel and their corresponding predicted probability of recommendation is given below.
Quality
Brand_Image
Shipping_Speed
Dist_Channel
Recommend_predicted_probability
1
1
5
1
0.005272
2
1
5
1
0.010091
3
1
5
1
0.019227
4
1
5
1
0.036334
5
1
5
1
0.067610
6
1
5
1
0.122389
7
1
5
1
0.211485
8
1
5
1
0.340290
9
1
5
1
0.498000
10
1
5
1
0.656109
1
5
5
1
0.059749
2
5
5
1
0.108903
3
5
5
1
0.190310
4
5
5
1
0.311310
5
5
5
1
0.465057
6
5
5
1
0.625744
7
5
5
1
0.762783
8
5
5
1
0.860806
9
5
5
1
0.922442
10
5
5
1
0.958113
1
10
5
1
0.586375
2
10
5
1
0.731648
3
10
5
1
0.839835
4
10
5
1
0.909784
5
10
5
1
0.950968
6
10
5
1
0.973891
7
10
5
1
0.986252
8
10
5
1
0.992804
9
10
5
1
0.996245
10
10
5
1
0.998044
1
1
5
0
0.002009
2
1
5
0
0.003857
3
1
5
0
0.007392
4
1
5
0
0.014119
5
1
5
0
0.026805
6
1
5
0
0.050307
7
1
5
0
0.092457
8
1
5
0
0.163830
9
1
5
0
0.273686
10
1
5
0
0.420188
1
5
5
0
0.023568
2
5
5
0
0.044362
3
5
5
0
0.081961
4
5
5
0
0.146540
5
5
5
0
0.248244
6
5
5
0
0.388410
7
5
5
0
0.549834
8
5
5
0
0.701406
9
5
5
0
0.818765
10
5
5
0
0.896785
1
10
5
0
0.350009
2
10
5
0
0.508749
3
10
5
0
0.665744
4
10
5
0
0.792983
5
10
5
0
0.880482
6
10
5
0
0.934073
7
10
5
0
0.964600
8
10
5
0
0.981275
9
10
5
0
0.990175
10
10
5
0
0.994867
Task 3.3
The plot of probabilities predicted for the available data is given below.
The probability lie in the range from 0 to 1.
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel))
With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative. Chatterjee, S. and Hadi, A.S., 2015.
Task 4
The regression output for predicting Pale Ale production (litres) on the basis of time (t) corresponding value of 1 with 3rd quarter of 2009.
The regression equation is given by Pale Ale production (litres) = 1111.39 + 16.39*t
For the given time the predicted value of Pale Ale production (litres) is summarized in the table below.
year
quarter
t
predicted pale ale (litres)
2019
Q2
44
1832.597747
2019
Q3
45
1848.988824
2019
Q4
46
1865.379901
2020
Q1
47
1881.770977
Conclusion
The average number of bottles ordered by the customer is 7665 with a low standard deviation of 0.89 units. There is not much difference in the proportion of customers who recommend Mad Dog Craft Beer (as a supplier) to others.
I expect a moderate positive linear relationship for order quantity with loyalty, quality, shipping speed, and shipping cost.
The regression equation is given by bye order quantity = 4.3091+ 0.1754892* quality -0.0727shipping speed + 0.28* shipping cost + 0.061* loyalty + 0.438* recommended. There is sufficient evidence to conclude that the model is significant. There is sufficient evidence to prove that the position of independent variables quality, shipping speed for my shipping cost from Novelty, and peppermint are significant at 5% level of significance.
The regression equation is given by order quantity = 0.50108 + 0.6911* quality + 0.8643* brand image - 0.0685*Quality*Brand_image. When the brand image is increased by 1 level of positivity along with the increase in quality by one Level, the quantity order is decreased by 68 bottles.
The regression equation is given by: P = exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel) / (1+exp(-13.278 + 0.654*Quality + 0.621*Brand Image + 1.159*Shipping Speed + 0.968*Distance channel)). With 1 unit increase in the quality, the likelihood of recommendation is increased by 0.654. With one unit increase in brand image, the probability of recommendation is increased by 0.621. With one Level increment in the shipping speed, the probability of recommendation is increased by 1.159. For the direct distribution channel, the probability of recommendation is 0.968 units more in comparison to the distribution channel through a sales representative.
The regression equation is given by Pale Ale production (litres) = 1111.39 + 16.39*t. the predicted Pale Ale production for 2nd quarter 2019, 3rd quarter 2019, 4th quarter 2019, and 1st quarter 2020 is 1832.59, 1848.98, 1865.37, and 1881.77 respectively.
References
Chatterjee, S. and Hadi, A.S., 2015. Regression analysis by example. John Wiley & Sons.
Draper, N.R. and Smith, H., 2014. Applied regression analysis(Vol. 326). John Wiley & Sons.
Fox, J., 2015. Applied regression analysis and generalized linear models. Sage Publications.
Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016. Understanding regression analysis: An introductory guide (Vol. 57). Sage Publications
Appendix
Task 2.2
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.722861
R Square
0.522528
Adjusted R Square
0.510222
Standard Error
0.625121
Observations
200
ANOVA
df
SS
MS
F
Significance F
Regression
5
82.96445
16.59289
42.46138
2.02E-29
Residual
194
75.81055
0.390776
Total
199
158.775
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
4.309159
0.364355
11.82683
1.18E-24
3.590555
5.027764
3.590555
5.027764
Quality
0.175489
0.03483
5.038511
1.07E-06
0.106796
0.244183
0.106796
0.244183
Shipping_Speed
-0.07279
0.112097
-0.64933
0.516896
-0.29387
0.148298
-0.29387
0.148298
Shipping_Cost
0.281479
0.069614
4.043402
7.6E-05
0.144181
0.418778
0.144181
0.418778
Loyalty
0.061506
0.010935
5.624941
6.41E-08
0.03994
0.083072
0.03994
0.083072
Recommend
0.438802
0.103864
4.224798
3.68E-05
0.233956
0.643649
0.233956
0.643649
Task 2.3
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.595757
R Square
0.354926
Adjusted R Square
0.345052
Standard Error
0.722883
Observations
200
ANOVA
df
SS
MS
F
Significance F
Regression
3
56.35338
18.78446
35.94704
1.48E-18
Residual
196
102.4216
0.522559
Total
199
158.775
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
0.501087
1.53677
0.326065
0.744723
-2.52964
3.531814
-2.52964
3.531814
Quality
0.691114
0.187122
3.69338
0.000287
0.322082
1.060146
0.322082
1.060146
Brand_Image
0.864327
0.269459
3.207644
0.001563
0.332917
1.395737
0.332917
1.395737
Quality*Brand_image
-0.06856
0.032934
-2.08163
0.038675
-0.1335
-0.00361
-0.1335
-0.00361
Task...