Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS DEAKIN BUSINESS SCHOOL...

1 answer below »
Refer to the pdf attached


Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS DEAKIN BUSINESS SCHOOL FACULTY OF BUSINESS AND LAW, DEAKIN UNIVERSITY Assignment Two Background This is an individual assignment. You need to analyse the given dataset and then interpret and draw conclusions from your analysis. You then need to convey your findings in a written report to an expert in Business Analytics. Percentage of the final grade 35% The Due Date and Time 8 pm Thursday 20th May 2021 Submission instructions The assignment must be submitted by the due date, electronically in CloudDeakin. When submitting electronically, you must check that you have submitted the work correctly by following the instructions provided in CloudDeakin. Please note that we will NOT accept any paper or email copies or part of the assignment submitted after the due date. Information for students seeking an extension BEFORE the due date If you wish to seek an extension for this assignment before the due date, you need to apply directly to the Unit Chair by completing the Assignment and Online Test Extension Application Form before Thursday 5 pm 20th May 2021. Please make sure you attach all supporting documentation and a draft of your assignment. The request for an extension needs to occur as soon as you become aware that you will have difficulty meeting the due date. Please note: Unit Chairs can only grant extensions up to two weeks beyond the original due date. If you require more than two weeks or have already been provided with an extension by the Unit Chair and require additional time, you must apply for Special Consideration via StudentConnect within three business days of the due date. Conditions under which an extension will usually be considered include: • Medical – to cover medical conditions of a severe nature, e.g. hospitalisation, severe injury or chronic illness. Note: temporary minor ailments such as headaches, colds, and minor gastric upsets are not severe medical conditions and are unlikely to be accepted. However, severe cases of these may be considered. • Compassionate – e.g. death of a close family member, significant family and relationship problems. • Hardship/Trauma – e.g. sudden loss or gain of employment, severe disruption to domestic arrangements, a victim of crime. Note: misreading the due date, assignment anxiety, or multiple assignments will not be accepted as grounds for consideration. https://www.deakin.edu.au/students/faculties/buslaw/student-support/assignment-extensions MIS771 Descriptive Analytics and Visualisations Page 2 of 9 Information for students seeking an extension AFTER the due date If the due date has passed, you require more than two weeks extension, or you have already been provided with an extension and require additional time, you must apply for Special Consideration via StudentConnect. Please be aware that applications are governed by University procedures and must be submitted within three business days of the due date or extension due date. Please be aware that in most instances, the maximum amount of time that can be granted for an assignment extension is three weeks after the due date, as Unit Chairs are required to have all assignment submitted before results/feedback can be released back to students. Penalties for late submission The following marking penalties will apply if you submit an assessment task after the due date without an approved extension: • 5% will be deducted from available marks for each day, or part thereof, up to five days. • Work submitted more than five days after the due date will not be marked; you will receive 0% for the task. Note: 'Day' means calendar day. The Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to assess the task after the due date. Additional information: For advice regarding academic misconduct, special consideration, extensions, and assessment feedback, please refer to the document "Rights and responsibilities as a student" in the "Unit Guide and Information" folder under the "Resources" section in the MIS771 CloudDeakin site. The assignment uses the dataset file A2T12021.xlsx, which can be downloaded from CloudDeakin. Analysis of the data requires the use of techniques studied in Module-2. MIS771 Descriptive Analytics and Visualisations Page 3 of 9 Assurance of Learning This assignment assesses the following Graduate Learning Outcomes and related Unit Learning Outcomes: Graduate Learning Outcome (GLO) Unit Learning Outcome (ULO) GLO1: Discipline-specific knowledge and capabilities - appropriate to the level of study related to a discipline or profession. GLO2: Communication - using oral, written and interpersonal communication to inform, motivate and effect change GLO5: Problem Solving - creating solutions to authentic (real world and ill-defined) problems. GLO6: Self-Management - working and learning independently, and taking responsibility for personal actions ULO 1: Apply quantitative reasoning skills to solve complex problems. ULO 2: Plan, monitor, and evaluate own learning as a data analyst. ULO 3: Deduce clear and unambiguous solutions in a form that they useful for decision making and research purposes and for communication to the wider public. Feedback before submission You can seek assistance from the teaching staff to ascertain whether the assignment conforms to submission guidelines. Feedback after submission An overall mark, together with feedback, will be released via CloudDeakin, usually within 15 working days. You are expected to refer and compare your answers to the feedback to understand any areas of improvement. MIS771 Descriptive Analytics and Visualisations Page 4 of 9 The Case Study RogerLake is a leading Australian supermarket chain with 500 stores. Originating from a family-based general store, RogerLake now has stores all over Australia, with the first one being established in 1974. Individual store managers of RogerLake have wide-ranging powers about the day-to-day operations of their stores. However, RogerLake's strategic planning and direction take place in the company Head Office in Adelaide. RogerLake is anticipating a shift in the business climate within the next five years. The Head Office team is keen to implement the changes introduced during COVID-19 across the supermarket chain. They are confused about the store manager's lack of enthusiasm to open their stores 24x7 or launch an accompanying eStore, given that the Head office has invested heavily in a digital platform, self- checkout machines and staff. Subsequently, the Head Office management team has approached ANALYTICS7 and asked them to conduct a study to understand the characteristics of RogerLake stores and their business performance. The Data For this study, ANALYTICS7 has collected two sets of Data: 1. The first dataset is a random sample of 150 stores extracted from the company's data mart. A complete listing of variables, definitions, and an explanation of their coding are provided in Working Sheet "Variable Description." 2. The second dataset is about quarterly sales of RogerLake stores. The details of the Time- Series data is available on Working Sheet "Quarterly Sales." Your Role in ANALYTICS7 You are a modeller at ANALYTICS7. The team leader (Hugo Barra – MBA and MSc in DataScience) has asked you to lead the modelling component for the RogerLake project. Your need to review and complete the modelling activities as per the document. The minutes of the team meeting is below. MIS771 Descriptive Analytics and Visualisations Page 5 of 9 Form 210-3 ANALYTICS7 Team Meeting ANALYTICS7 727 Collins St, Docklands VIC 3008 Phone: (+61 3 212 66 000) [email protected] Reference AP-210 RogerLake Project Revised 24th April 2021 Level Expert Analysis Meeting Chair Hugo Barra Date 24 April 2021 Time 11:00 AM Location ANALYTICS7 L4.340 Topic RogerLake Research Project – Analytics Details Meeting Purpose: Specifying and Allocating Data Analytics Tasks Discussion items: • Modelling Store Sales. • Modelling the likelihood of a store opening 24x7 • Modelling the likelihood of a store launching an accompanying eStore • Forecasting Quarterly Sales for the upcoming four quarters. • Producing a technical report. Detailed Action Items Who: Modeller What: 1. Build a regression model to estimate Store Sales. 2. Hugo has performed a separate regression analysis and found that the number of competitors is a significant predictor of Store Sales. He believes that the relationship between Store Sales and the number of competitors should be weaker for those stores that are open 24x7. Model the interaction between the variables to test Hugo's assumption and comment whether there is sufficient evidence to conclude that the interaction term is statistically significant in the model. 3. Build a model to predict the likelihood of a store opening 24x7. 4. Finalise Hugo's model to predict the likelihood of a store launching an eStore. 4.1. Hugo has completed the initial analysis for this task. He has narrowed down the key predictors of the likelihood of a store launching an eStore to "Manager's Age, Experience and Gender". Your task is to continue his work and develop a model to ascertain the "likelihood of a store launching an eStore". 4.2. Hugo is specifically interested in understanding the probability of stores that meet the following criteria to launch an eStore: Those stores with managers, a) in their mid-thirties; b) with varying levels of managerial experience (i.e. 2-16 years?); mailto:[email protected] MIS771 Descriptive Analytics and Visualisations Page 6 of 9 c) and across both male and female store managers. He believes that the store manager's age, managerial experience, and gender may influence the decision to launch an eStore. RogerLake wishes to know whether to recruit tech-savvy young store managers for their stores. Accordingly, your job is to visualise the predicted probability of launching an eStore with the attributes described earlier. 5. Develop a time-series model to forecast RogerLake's Sales for the next four quarters. 6. Write a report detailing all aspects of the analysis above (items 1-5). The report should be as
Answered 6 days AfterMay 11, 2021MIS771Deakin University

Answer To: Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive...

Subhanbasha answered on May 18 2021
170 Votes
Technical Report
Introduction:
    The main aim of this analysis is to find the pattern or the trend of the supermarkets in Australia which is originating the family based general stores. They have 500 stores which they want to know the trend or the present situation of the supermarkets in Australia. After the Covid-19 the life style of the people is entirely changed. So we need to grasp that pattern followed by the people to turn the stores into profitable way.
    Here we can do some statistical analysis to tackle the above problem which is facing by the head office of the supermarkets. By using this analysis we can suggest or recommend them to further steps taken by them. The statistical analysis is about to regression analysis, visualization and using forecasting techniques to know the trend of the sales for upcoming days, weeks or months. And also we can suggest that is there any need to open new stores in particular locations or need to open the supermarkets 24x7 in the existing areas because in now a days of the Covid-19 most of the governments are going to take action like lockdown and some restrictions on the markets.
    We will do the analysis step by step by using the appropriate variables or features in the model building which are significant in the model. This will help us to find the trends of t
he markets. And also we can make decisions about the new stores or existing stores. The next analysis will be the forecasting part which will help us to know the future performance of the stores by quarterly. By forecasting the future sales of the supermarket will give us the glance and we can also take action according to the futures forecasted sales of supermarket.
The steps will took in the analysis is as follows
1. Regression analysis with default parameters
2. Regression analysis with appropriate parameters
3. Visualization of the results
4. Visualization of the probabilities of the regression ( final model)
5. Time series analysis and forecasting
By doing all the mentioned above analysis we can make decision about the supermarkets which we can suggest to the head office of the supermarkets.
Analysis:
1. Regression model to estimate Store Sales:
    By using the given data of sample of supermarkets we do regression analysis with the default parameters that is by using all the independent variables.
We performed the regression analysis by using all independent variables and sales as a dependent variable. The output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.928556709
    R Square
    0.862217562
    Adjusted R Square
    0.846794155
    Standard Error
    1.397739139
    Observations
    150
The above output is all about the entire model performance on the data.
Here the R square value is 0.8622 which means that the independent variable which we used in the model is explaining the variation the dependent variables that is sales is 86.22% which is pretty good model. The multiple R square values is 0.9285 which means that there is 92.85% correlation between the dependent and independent variables which means that there is chance of 92.85% when we increase the independent variables the dependent variables sales will increase.
    The Adjusted R square is 0.8467 which is also same as the interpretation of R square but the difference is when we increase the unnecessary or un related variables in the model the R square values will increase but the Adjusted R square values won’t increase it will increase only when the related variables included in the model. So, here the considerable or the identical measure for accuracy of the model is Adjusted R square. The standard error also little bit high.
The next part of the output is Anova of the regression model which will help us to find the above given measures it is not considerable.
The next part of the output will say about the each independent variable behavior and their usage in the model. From this we can find the appropriate variables in the model that means which variables will be useful to find the variations in the sales.
    We can consider the best variable by using the p value the thumb rule is which variables have the p values less than 0.05 those are significant in the model. Here we can also use this method to identify the significant variables in the model.
The variables wage, Number of Competitors, Gender Manager and Age Manager are the significant variables for the model so we can use those variables only into the model.
By using the above variables only into the model output as follows.
    SUMMARY OUTPUT
    
    
    Regression Statistics
    Multiple R
    0.846659199
    R Square
    0.7168318
    Adjusted R Square
    0.711013275
    Standard Error
    1.919673658
    Observations
    150
Here the R square value is 0.7168 and the Adjusted R square value is 0.7110 by comparing to the above that is default model the accuracy measures are very low.
The main reason for this is if we use the more relevant variables in the model then model will learn in better way and will give the good amount of accuracy. But here we are used 3 variables only in the model. From the above interpretation of p values is correct but there may be some interaction effects between the variables or there may be Multicollinearity present.
We can get to know that there are some other variables useful to find the variation in the sales so we can find those variables and include in the regression to get better result.
Here we included the variables wage, Advertising_Expense_...000., Number_of_Staff, Age_of_the_Store..Yrs., Number_Of_Competitors., Hours_Trading., Parking_.Spaces Membership_Union.., Open_24X7.
By using this variables in the model the output of the model as below
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.891968804
    R Square
    0.795608348
    Adjusted R Square
    0.782468885
    Standard Error
    1.665517314
    Observations
    150
The model performance is better than the above model where we used only three variables in the model but not better than the default model where we used all the variables in the model.
The output also shown that the variables Hours_Trading., Parking_.Spaces, mbership_Union.., Open_24X7 are not significant in the model. Though if we not using these variables then the accuracy is going off.
In the next step we will add another set of variables in the model then we will see the performance of the model.
By including the Experience_Manager and eStore into the above model the total performance of the model is
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.915194156
    R Square
    0.837580344
    Adjusted R Square
    0.824633849
    Standard Error
    1.495413659
    Observations
    150
From the above output of the regression the accuracy of the model is somehow better than the above model. Here the R square value is 0.8375 and the Adjusted R square is 0.8246 and also standard error is low comparing to the above model.
The above normal probability plot also showing that the errors following approximately normal.
Though here some of the variables are not significant in the model we continue with this to better performance of the model which will help us to observe the variations in the dependent variables that is sales.
2. Regression model by using only two independent variables:
In the next step we are going to develop a regression model using the variables which we discussed in the team meeting. The variables are Number_Of_Competitors and Open_24X7 and to know there is any interaction effect on the sales we are created the interaction variable by using the above two variables.
    Now we are going to run the regression model which is having the interaction term column to know the significance.
The regression output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.599219472
    R Square
    0.359063976
    Adjusted R Square
    0.345894057
    Standard Error
    2.888101859
    Observations
    150
It is giving low performance of the model.
The co efficient output table as follows
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Intercept
    8.154192
    0.853889
    9.549475
    4.34E-17
    Number_Of_Competitors:
    1.247082
    0.317329
    3.929934
    0.000131
    Open_24X7
    7.857261
    1.036783
    7.578502
    3.71E-12
    Intercation term
    -2.71651
    0.366981
    -7.40232
    9.78E-12
Here the p value indicates that the significance of the variables. Here the interaction term having the p values less than 0.05 so we can say that the interaction term is the significant difference in the regression model.
Here we can also see that the co efficient of the variables Open_24X7 is higher than Number_Of_Competitors which means that the store which is opened by 24x7 that store will have the high sales than the not opened 24x7.
Here also the errors are following normal distribution which is the one of the assumptions of regression.
So, we can say that there is sufficient evidence that the interaction term is significantly different in the model.
3. Model to predict the likelihood of a store opening 24x7:
Now we are going to develop a regression model to predict the likelihood of the store opening 24x7. For this we use the variables open 24x7 as the dependent and all other variables as independent variables. The model output as follows.
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.414756081
    R Square
    0.172022607
    Adjusted R Square
    0.07933857
    Standard Error
    0.462108234
    Observations
    150
The about output clearly showing the model performing very poor because the R square value is 0.1720 and the Adjusted R square value is 0.0793 which means there is only 7.9% explaining the independent variables to the dependent variable that is store open 24x7.
    From the co efficient table we can see the significance of the variables we can remove the non-significant variables from the model.
Here the variables Gross_Profit ($m), Number_Of_Competitors, Age_Manager and eStore are the significant variables in the model. So we use these variables in the model and see the model performance.
The model output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.286999302
    R Square
    0.082368599
    Adjusted R Square
    0.057054629
    Standard Error
    0.467667294
    Observations
    150
This model is giving low performance than the above model.
Next we are going to add some other variables to the existing model that is Number_of_Staff, Sales ($m), Age_Manager then we can see the performance of the model.
The model output is as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.304552311
    R Square
    0.09275211
    Adjusted R Square
    0.061250447
    Standard Error
    0.466625646
    Observations
    150
Compare to the above two models the model which we executed now is the not that much of good model. So we can use the default model that is using all variables as a independent variables.
4. Model for predicting store launching an eStore:
    Here we use the Managers age, experience and gender as a independent variables to make a model and the dependent variable is eStore.
This is about the logistic regression because the dependent variables is categorical that is it have two levels that is 0 and 1 that mean presence of estore and non-presence of estore.
    The model output as follows
    SUMMARY OUTPUT
    
    
    
    Regression Statistics
    Multiple R
    0.600514715
    R Square
    0.360617923
    Adjusted R Square
    0.347479935
    Standard Error
    0.399112521
    Observations
    150
The above output is showing that there is only 34.74% accuracy of the model. The R square values also 36.06% which is somehow better but not good accuracy.
The co efficient table as follows
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Intercept
    0.519507258
    0.193683553
    2.682247671
    0.008156005
    Age_Manager
    -0.018718302
    0.004577348
    -4.089333082
    7.12001E-05
    Experience_Manager
    0.057377217
    0.008610923
    6.663306419
    5.10648E-10
    Gender_Manager
    0.166663835
    0.068771589
    2.423440228
    0.01659799
The above output table is the each independent variable co efficient and their significance. By observing the p value column all the independent variables used in the model are significant which means those variables will be useful to predict the likelihood of the estore.
The plots of the regression as follows and these will helpful us to get to know the each variable effect on the dependent variables that is estore.
The plot above is the relationship between the age of the manager and estore. Here we can clearly see that the managers age greater than 30 is most of the stores are not presence of the eStore. But in some of the stores which are less is have the presence of the eStore.
This plot is about the relationship between the gender of the manager and the estore presence. The male gender of the manager is likely to have the eStore than the female.
The above plot is the relationship of the manager experience and the estore presence. The plot is clearly showing that where the manager age is high then there is likely to have the estore of the supermarket. That is more than 10 years of experience of the manager is likely to have the estore for...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here