You have to submit 2 files :Answer Report: In this, you need to submit all the answers to all the...

Question

You have to submit 2 files :Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used,...

You have to submit 2 files :

Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, and all outputs of codes like graphs, tables, etc. Your report should not be filled with codes. You will be evaluated based on the business report.

Note: In the business report, there should be a proper interpretation of all the tasks performed along with actionable insights. Only the presence of interpretation of the models is not sufficient to be eligible for full marks in each of the criteria mentioned in the rubric. Marks will be deducted wherever inferences are not clearly mentioned.

Jupyter Notebook file: This is a must and will be used for reference while evaluating.

Any assignment found copied/ plagiarized by another person will not be graded and marked as zero. Please ensure timely submission as a post-deadline assignment will not be accepted.

Problem 1 for the Data Set : Shoesales.csv

You are an analyst in the IJK shoe company and you are expected to forecast the sales of the pairs of shoes for the upcoming 12 months from where the data ends. The data for the pair of shoe sales have been given to you from January 1980 to July 1995.

Problem 2 for the Data Set SoftDrink.csv:

You are an analyst in the RST soft drink company, and you are expected to forecast the sales of the production of the soft drink for the upcoming 12 months from where the data ends. The data for the production of soft drinks have been given to you for the period from January 1980 to July 1995.

Please perform the following questions on each of these two data sets separately.

Read the data as an appropriate Time Series data and plot the data.

Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.

Split the data into training and testing. The test data should start in 1991.

Build various exponential smoothing models on the training data and evaluate the model using RMSE on the test data.

Other models such as regression,naïve forecast models, simple average models, etc. should also be built on the training data and check the performance on the test data using RMSE.

Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity and comment.

Note: Stationarity should be checked at alpha = 0.05.

Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.

Build a table with all the models built along with their corresponding parameters and the respective RMSE values on the test data.

Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months into the future with appropriate confidence intervals/bands.

Comment on the model thus built and report your findings and suggest the measures that the company should be taking for future sales.

Extended Project - Time Series Forecasting Project

Extended Project - Time Series Forecasting Project
Criteria	Ratings	Pts
This criterion is linked to a Learning Outcome1. Read the data as an appropriate Time Series data and plot the data.	This area will be used by the assessor to leave comments related to this criterion.	2.0pts
This criterion is linked to a Learning Outcome2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.	This area will be used by the assessor to leave comments related to this criterion.	9.0pts
This criterion is linked to a Learning Outcome3. Split the data into training and test. The test data should start in 1991.	This area will be used by the assessor to leave comments related to this criterion.	2.0pts
This criterion is linked to a Learning Outcome4. Build various exponential smoothing models on the training data and evaluate the model using RMSE on the test data. Other models such as regression,naïve forecast models, simple average models etc. should also be built on the training data and check the performance on the test data using RMSE. view longer description	This area will be used by the assessor to leave comments related to this criterion.	16.0pts
This criterion is linked to a Learning Outcome5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.	This area will be used by the assessor to leave comments related to this criterion.	4.0pts
This criterion is linked to a Learning Outcome6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.	This area will be used by the assessor to leave comments related to this criterion.	11.0pts
This criterion is linked to a Learning Outcome7. Build a table (create a data frame) with all the models built along with their corresponding parameters and the respective RMSE values on the test data.	This area will be used by the assessor to leave comments related to this criterion.	2.0pts
This criterion is linked to a Learning Outcome8. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months into the future with appropriate confidence intervals/bands.	This area will be used by the assessor to leave comments related to this criterion.	3.0pts
This criterion is linked to a Learning Outcome9. Comment on the model thus built and report your findings and suggest the measures that the company should be taking for future sales. view longer description	This area will be used by the assessor to leave comments related to this criterion.	5.0pts
This criterion is linked to a Learning OutcomePlease reflect on all that you learnt and fill this reflection report. You have to copy the link and paste it on the URL bar of your respective browser.https://docs.google.com/forms/d/e/1FAIpQLSeBxE1cfP7ugyx8sa1JFGg_Nkv-jlEztsszbc9US911oWo2KQ/viewform view longer description	This area will be used by the assessor to leave comments related to this criterion.	0.0pts
This criterion is linked to a Learning OutcomeQuality of Business Report (Please refer to the Evaluation Guidelines for Business report checklist. Marks in this criteria are at the moderator's discretion)	This area will be used by the assessor to leave comments related to this criterion.	6.0pts

shoe-sales-sslpxiim-jzyk5rtn.csv softdrink-rea5n3wx-iteikfrx.csv

Answered 4 days AfterApr 14, 2023

Mohd · Accepted Answer

Introduction:
We have two time series data to build and test the different time series forecasting model in order to achieve highly accurate result for future prediction. We have used Exponential smoothing, Holt winters method, simple linear regression and ARIMA/SARIMA models. For stationarity test we have used Augmented Dicky Fuller Test. We have also used n order differencing techniques to transformed the time series data from non-stationary to stationary data. We have used root mean squared error as performance evaluation metrics.
On the test dataset, simple linear regression with the prediction of soft drink production had the lowest RMSE. In order to obtain the stationary time series data, we additionally performed some differentiating. First order differencing has allowed us to obtain the needed stationary data in both situations.
We used an augmented dickey fuller test to verify the stationarity of our data. Simple Exponential Smoothing with default parameters produced the lowest RMSE in our second dataset of shoe sales, whereas Holt Winter’s Model produced the greatest RMSE. 
Methods:
Exponential Smoothing:
A forecasting technique for univariate time series data is exponential smoothing. With this strategy, predictions are weighted averages of historical observations, with the weights of earlier observations decreasing exponentially. The study may now include model data with trends and seasonal components thanks to various forms of exponential smoothing. It has had great popularity among analysts as a rapid technique to provide precise projections in a variety of disciplines, especially in business. Additionally, it is utilized in signal processing to filter high-frequency noise and smooth signals.
Only the level component is estimated using simple exponential smoothing. Consider the level component to be the average or normal value. For each observation, this procedure changes the level component. It only utilizes one weighting parameter, alpha (), because it only models one component. The amount of smoothing is controlled by this variable by altering how soon the level component catches up with the most recent data.
The range of possible alpha values is 0 to 1, inclusive. Because they average out changes over time, lower values lend greater weight to historical observations and generate smoother fitted lines. Higher values limit the degree of averaging by the earlier data, which results in a more jagged line since they place a higher emphasis on the current data.
To eliminate the irregular fluctuations (noise) and capture the underlying pattern, you usually wish to smooth the data. However, you don't want to smooth things out too much and lose important details! Nevertheless, while selecting alpha, apply your subject-matter expertise and professional standards.  = 0.2 is a typical default value.
Holt Winters’s Method:
For univariate time series data, triple exponential smoothing can describe the seasonality, trend, and level components. Data patterns known as seasonal cycles appear over an average number of observations. Another name for triple exponential smoothing is Holt-Winters’s exponential smoothing. This approach incorporates the gamma () parameter to take the seasonal component into consideration. You must provide the length of the seasonal cycle for this approach. These durations could be, for instance, weekly (7), monthly (12), or quarterly (4). Seasonality in the triple exponential smoothing can be multiplicative or additive. The pattern of multiplicative seasonality is that the magnitude grows as the data grow. Even when the data vary, additive seasonality depicts a seasonal pattern with a consistent scale.
Stationarity Test
The most popular statistical tests used to determine whether a particular Time series is stationary or not are the Kwiatkowski-Phillips-Schmidt-Shin test (KPSS test) and the Augmented Dickey-Fuller test (ADF Test). When examining the stationarity of a series, these two tests are the most often applied statistical tests. Stationarity is a crucial component of time series. A model cannot predict on non-stationary time series data, hence the first step in ARIMA time series forecasting is to calculate the number of differences needed to make the series stationary. Let's attempt to comprehend a little more thoroughly.
The term "stationary series" refers to a series whose statistical characteristics, such as mean, variance, covariance, and standard deviation, do not change over time or are not affected by the passage of time. To put it another way, stationarity in time series refers to a series without any elements of trend or season. It enables us to use forecasting models like the SARIMA (Seasonal ARIMA) model or the ARIMA (Auto Regressive Integrated Moving Average) model that better reflect our understanding of the data.
Augmented Dickey-Fuller testing
Strong assumptions about your data are made throughout statistical testing. They can only be used to provide information about how easily or not easily a null hypothesis may be rejected. For a given problem to have any significance, the outcome must be understood. They do, however, offer a rapid examination and confirmatory proof of the time series' stationary or non-stationary nature. A unit root, which can be problematic in statistical inference using time series models, is a characteristic of some stochastic processes (such as random walks) in probability theory and statistics.

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

You have to submit 2 files :Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used,...

Answer To: You have to submit 2 files :Answer Report: In this, you need to submit all the answers to all the...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment