Programming in RUndertake EDA on this dataset. a. Do you need to clean the data in any way? Justify what you decide to do (or not do). b. Describe two insights gained just from EDA that would be of...

Programming in RUndertake EDA on this dataset. a. Do you need to clean the data in any way? Justify what you decide to do (or not do). b. Describe two insights gained just from EDA that would be of interest to the sales manager. 2. Basic model fitting: a. Creating the model: i. Create an aggregated data set using the fields date, industry and location, with a mean of monthly_amount. ii. Create a line plot of the variable monthly_amount for industry = 1 and location = 1. Note the seasonality by month in this time series. iii. For industry = 1 and location = 1, train a linear regression model with monthly_amount as the target. i. Note 1 :Remember that time is very important in this model, so be sure to include variable(s) for the time sequence. (Hint: on your plot you may see local trend like seasonality. Consider how you could craft a variable to capture this?. You may also see a global upwards or downwards slope, could you craft a variable to capture this? Therefore there are two simple variables you could create to capture time. Could you craft more complex ones, perhaps with polynomials to capture local or global trends? Experiment and see! ii. Note 2: Carefully think about how you split your test and train sets. (Hint: Random is not appropriate!) iv. Create a prediction for monthly_amount in December 2016. Comment on how reasonable this prediction is. For example, if you were to plot it on the same plot as 2aii, would it sit somewhere reasonable? b. Describe the model: i. How well does your model fit the data it is trained on in a statistical sense? Define & describe an appropriate quantitative measure. Justify your choice of measure. ii. How well does your model predicting out-of-sample? Define & describe an appropriate quantitative measure. Justify your choice of measure. 3. Advanced model fitting: a. Apply the modelling process you built for industry 1 and location 1 to all industries and locations programmatically. b. Calculate your evaluation measure for the training data and your testing data, for all models. Identify the two industries and two locations for which your method performs worst. i. Ensure your models all make a prediction for December 2016. c. What might be causing the models on these two industries and locations to be performing poorly (HINT: Some plots may help here…)? How might you fix this in future?
Nov 19, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here