Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS DEAKIN BUSINESS SCHOOL...

1 answer below »

View more »
Answered 6 days AfterMay 11, 2021MIS771Deakin University

Answer To: Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive...

Subhanbasha answered on May 18 2021
172 Votes
Technical Report
Introduction:
    The main aim of this analysis is to find the pattern or the trend of the supermarkets in Australia which is originating the family based general stores. They have 500 stores which they want to know the trend or the present situation of the supermarkets in Australia. After the Covid-19 the life style of the people is entirely changed. So we need to grasp that pattern followed by the people to turn the stores into profitable way.
    Here we can do some statistical analysis to tackle the above problem which is facing by the head office of the supermarkets. By using this analysis we can suggest or recommend them to further steps taken by them. The statistical analysis is about to regression analysis, visualization and using forecasting techniques to know the trend of the sales for upcoming days, weeks or months. And also we can suggest that is there any need to open new stores in particular locations or need to open the supermarkets 24x7 in the existing areas because in now a days of the Covid-19 most of the governments are going to take action like lockdown and some restrictions on the markets.
    We will do the analysis step by step by using the appropriate variables or features in the model building which are significant in the model. This will help us to find the trends of t
he markets. And also we can make decisions about the new stores or existing stores. The next analysis will be the forecasting part which will help us to know the future performance of the stores by quarterly. By forecasting the future sales of the supermarket will give us the glance and we can also take action according to the futures forecasted sales of supermarket.
The steps will took in the analysis is as follows
1. Regression analysis with default parameters
2. Regression analysis with appropriate parameters
3. Visualization of the results
4. Visualization of the probabilities of the regression ( final model)
5. Time series analysis and forecasting
By doing all the mentioned above analysis we can make decision about the supermarkets which we can suggest to the head office of the supermarkets.
Analysis:
1. Regression model to estimate Store Sales:
    By using the given data of sample of supermarkets we do regression analysis with the default parameters that is by using all the independent variables.
We performed the regression analysis by using all independent variables and sales as a dependent variable. The output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.928556709
    R Square
    0.862217562
    Adjusted R Square
    0.846794155
    Standard Error
    1.397739139
    Observations
    150
The above output is all about the entire model performance on the data.
Here the R square value is 0.8622 which means that the independent variable which we used in the model is explaining the variation the dependent variables that is sales is 86.22% which is pretty good model. The multiple R square values is 0.9285 which means that there is 92.85% correlation between the dependent and independent variables which means that there is chance of 92.85% when we increase the independent variables the dependent variables sales will increase.
    The Adjusted R square is 0.8467 which is also same as the interpretation of R square but the difference is when we increase the unnecessary or un related variables in the model the R square values will increase but the Adjusted R square values won’t increase it will increase only when the related variables included in the model. So, here the considerable or the identical measure for accuracy of the model is Adjusted R square. The standard error also little bit high.
The next part of the output is Anova of the regression model which will help us to find the above given measures it is not considerable.
The next part of the output will say about the each independent variable behavior and their usage in the model. From this we can find the appropriate variables in the model that means which variables will be useful to find the variations in the sales.
    We can consider the best variable by using the p value the thumb rule is which variables have the p values less than 0.05 those are significant in the model. Here we can also use this method to identify the significant variables in the model.
The variables wage, Number of Competitors, Gender Manager and Age Manager are the significant variables for the model so we can use those variables only into the model.
By using the above variables only into the model output as follows.
    SUMMARY OUTPUT
    
    
    Regression Statistics
    Multiple R
    0.846659199
    R Square
    0.7168318
    Adjusted R Square
    0.711013275
    Standard Error
    1.919673658
    Observations
    150
Here the R square value is 0.7168 and the Adjusted R square value is 0.7110 by comparing to the above that is default model the accuracy measures are very low.
The main reason for this is if we use the more relevant variables in the model then model will learn in better way and will give the good amount of accuracy. But here we are used 3 variables only in the model. From the above interpretation of p values is correct but there may be some interaction effects between the variables or there may be Multicollinearity present.
We can get to know that there are some other variables useful to find the variation in the sales so we can find those variables and include in the regression to get better result.
Here we included the variables wage, Advertising_Expense_...000., Number_of_Staff, Age_of_the_Store..Yrs., Number_Of_Competitors., Hours_Trading., Parking_.Spaces Membership_Union.., Open_24X7.
By using this variables in the model the output of the model as below
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.891968804
    R Square
    0.795608348
    Adjusted R Square
    0.782468885
    Standard Error
    1.665517314
    Observations
    150
The model performance is better than the above model where we used only three variables in the model but not better than the default model where we used all the variables in the model.
The output also shown that the variables Hours_Trading., Parking_.Spaces, mbership_Union.., Open_24X7 are not significant in the model. Though if we not using these variables then the accuracy is going off.
In the next step we will add another set of variables in the model then we will see the performance of the model.
By including the Experience_Manager and eStore into the above model the total performance of the model is
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.915194156
    R Square
    0.837580344
    Adjusted R Square
    0.824633849
    Standard Error
    1.495413659
    Observations
    150
From the above output of the regression the accuracy of the model is somehow better than the above model. Here the R square value is 0.8375 and the Adjusted R square is 0.8246 and also standard error is low comparing to the above model.
The above normal probability plot also showing that the errors following approximately normal.
Though here some of the variables are not significant in the model we continue with this to better performance of the model which will help us to observe the variations in the dependent variables that is sales.
2. Regression model by using only two independent variables:
In the next step we are going to develop a regression model using the variables which we discussed in the team meeting. The variables are Number_Of_Competitors and Open_24X7 and to know there is any interaction effect on the sales we are created the interaction variable by using the above two variables.
    Now we are going to run the regression model which is having the interaction term column to know the significance.
The regression output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.599219472
    R Square
    0.359063976
    Adjusted R Square
    0.345894057
    Standard Error
    2.888101859
    Observations
    150
It is giving low performance of the model.
The co efficient output table as follows
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Intercept
    8.154192
    0.853889
    9.549475
    4.34E-17
    Number_Of_Competitors:
    1.247082
    0.317329
    3.929934
    0.000131
    Open_24X7
    7.857261
    1.036783
    7.578502
    3.71E-12
    Intercation term
    -2.71651
    0.366981
    -7.40232
    9.78E-12
Here the p value indicates that the significance of the variables. Here the interaction term having the p values less than 0.05 so we can say that the interaction term is the significant difference in the regression model.
Here we can also see that the co efficient of the variables Open_24X7 is higher than Number_Of_Competitors which means that the store which is opened by 24x7 that store will have the high sales than the not opened 24x7.
Here also the errors are following normal distribution which is the one of the assumptions of regression.
So, we can say that there is sufficient evidence that the interaction term is significantly different in the model.
3. Model to predict the likelihood of a store opening 24x7:
Now we are going to develop a regression model to predict the likelihood of the store opening 24x7. For this we use the variables open 24x7 as the dependent and all other variables as independent variables. The model output as follows.
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.414756081
    R Square
    0.172022607
    Adjusted R Square
    0.07933857
    Standard Error
    0.462108234
    Observations
    150
The about output clearly showing the model performing very poor because the R square value is 0.1720 and the Adjusted R square value is 0.0793 which means there is only 7.9% explaining the independent variables to the dependent variable that is store open 24x7.
    From the co efficient table we can see the significance of the variables we can remove the non-significant variables from the model.
Here the variables Gross_Profit ($m), Number_Of_Competitors, Age_Manager and eStore are the significant variables in the model. So we use these variables in the model and see the model performance.
The model output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.286999302
    R Square
    0.082368599
    Adjusted R Square
    0.057054629
    Standard Error
    0.467667294
    Observations
    150
This model is giving low performance than the above model.
Next we are going to add some other variables to the existing model that is Number_of_Staff, Sales ($m), Age_Manager then we can see the performance of the model.
The model output is as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.304552311
    R Square
    0.09275211
    Adjusted R Square
    0.061250447
    Standard Error
    0.466625646
    Observations
    150
Compare to the above two models the model which we executed now is the not that much of good model. So we can use the default model that is using all variables as a independent variables.
4. Model for predicting store launching an eStore:
    Here we use the Managers age, experience and gender as a independent variables to make a model and the dependent variable is eStore.
This is about the logistic regression because the dependent variables is categorical that is it have two levels that is 0 and 1 that mean presence of estore and non-presence of estore.
    The model output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.600514715
    R Square
    0.360617923
    Adjusted R Square
    0.347479935
    Standard Error
    0.399112521
    Observations
    150
The above output is showing that there is only 34.74% accuracy of the model. The R square values also 36.06% which is somehow better but not good accuracy.
The co efficient table as follows
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Intercept
    0.519507258
    0.193683553
    2.682247671
    0.008156005
    Age_Manager
    -0.018718302
    0.004577348
    -4.089333082
    7.12001E-05
    Experience_Manager
    0.057377217
    0.008610923
    6.663306419
    5.10648E-10
    Gender_Manager
    0.166663835
    0.068771589
    2.423440228
    0.01659799
The above output table is the each independent variable co efficient and their significance. By observing the p value column all the independent variables used in the model are significant which means those variables will be useful to predict the likelihood of the estore.
The plots of the regression as follows and these will helpful us to get to know the each variable effect on the dependent variables that is estore.
The plot above is the relationship between the age of the manager and estore. Here we can clearly see that the managers age greater than 30 is most of the stores are not presence of the eStore. But in some of the stores which are less is have the presence of the eStore.
This plot is about the relationship between the gender of the manager and the estore presence. The male gender of the manager is likely to have the eStore than the female.
The above plot is the relationship of the manager experience and the estore presence. The plot is clearly showing that where the manager age is high then there is likely to have the estore of the supermarket. That is more than 10 years of experience of the manager is likely to have the estore for...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
March
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
5
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30