INF30030 – Business Analytics Semester 2, 2018Swinburne University of TechnologyPREDICTING THE TITANIC SURVIVORShttps://www.youtube.com/watch?v=c5cFoRLeVZw"I love waking up in the morning not knowing...

INF30030 – Business Analytics Semester 2, 2018Swinburne University of TechnologyPREDICTING THE TITANIC SURVIVORShttps://www.youtube.com/watch?v=c5cFoRLeVZw"I love waking up in the morning not knowing what's gonna happen or, who I'm gonna meet, where I'm gonna wind up." Quote from Titanic.This is not what we love in business environment ☹ASSIGNMENT NAME: Business Analytics Project Report for Predicting the Titanic SurvivorsASSIGNMENT TYPE: Group Assessment (4 members per group. All members of a grouphave to be from the same tutorial)DUE DATE: Fri OCT 19th, 2018by 11:59 pmOverviewTitanic is a 1997 American epic romance-disaster film directed, written, co-produced, and co- edited by James Cameron. A fictionalized account of the sinking of the RMS Titanic, it stars Leonardo DiCaprio and Kate Winslet as members of different social classes who fall in love aboard the ship during its ill-fated maiden voyage.BackgroundThe movie is on the backdrop of RMS Titanic; a British passenger liner that sank in the North Atlantic Ocean in the early morning of 15 April 1912, after colliding with an iceberg during her maiden voyage from Southampton to New York City. Of the 2,224 passengers and crew aboard, more than 1,500 died, making it one of the deadliest commercial peacetime maritime disasters in modern history. The largest ship afloat at the time it entered service, the RMS Titanic was the second of three Olympic class ocean liners operated by the White Star Line,INF30030 – Business Analytics Semester 2, 2018Swinburne University of Technologyand was built by the Harland and Wolff shipyard in Belfast. Thomas Andrews, her architect, died in the disaster.Cause of the disaster:One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.Your taskIn this assignment, your task is to complete the analysis of what sorts of people were likely to survive. In particular, we asks you to apply the tools and techniques that can help you to predict which passengers survived the tragedy. The final deliverable of your assignment task should be a report containing the following sections:1. Defining Business ObjectivesThe project report should start with the description of well-defined business objective. The model is supposed to address a business question. Clearly stating that objective will allow you to define the scope of your project, and will provide you with the exact test to measure its success. As an example, you may consider this to be a project for an insurance company that needs to consider travel insurance premium for passengers travelling on ocean liners similar to the Titanic.2. Preparing DataYou’ll use historical data to train your model (see Full dataset.csv). Data may contain duplicate records and outliers; depending on the analysis and the business objective, you decide whether to keep or remove them. Also, the data could have missing values, may need to undergo some transformation, and may be used to generate derived attributes that have more predictive power for your objective. Overall, the quality of the data indicates the quality of the model. You need to provide a data dictionary of all data items used in your analysis and their justification to be included in your model.3. Exploring dataOnce you have addressed missing values and duplicate data problem you will need to explore inherent relationships between the different variables. The focus variable for this study is the survived column (since you are asked to predict it). So this section should show your efforts to identity from the remaining columns in the dataset which are likely to have high predictive power on the ‘Survived’ column. You may use both basic statistical analyses such as correlations and present them as visual graphs or tables (raw data).INF30030 – Business Analytics Semester 2, 2018Swinburne University of Technology4. Sampling Your DataAfter preparing the data, the next step is Data Sampling. The data needs to be split into two sets: training and test datasets. While splitting, consider the % split between training and test data – Its always good to have more training data than test data (Rule of thumb – 70% training and 30% test data). Also make sure that the splitting process produces a stratified sample rather than a pure random sample. You need to build the model using the training dataset and the Test data set should be used to verify the accuracy of the model’s output. Doing so is absolutely crucial. Otherwise you run the risk of overfitting your model — training the model with a limited dataset, to the point that it picks all the characteristics (both the signal and the noise) that are only true for that particular dataset.5. Building the ModelSometimes the data or the business objectives lend themselves to a specific algorithm or model. Other times the best approach is not so clear-cut. As you explore the data, run as many algorithms as you can. Make sure you use techniques such as cross validation and ensembles as well to see if your modeling improves.6. Evaluating the ModelEach model iteration has to be evaluated and improved upon. To do the comparison models need to be evaluated based on model metrics such as confusion matrix, accuracy, precision, and recall. The final model should be the most optimized model based on the model metrics.Finally, you have to be smart how to present your results to the business stakeholders in an understandable and convincing way (such as reports, charts and/or dashboard) so they adopt your model.Datasets:To assist you with your assignment task, you are provided with a dataset that would help you to build a model. At step 4 you should split your data into training and test set (train.csv and test.csv). The train.csv should be used to train your model (Step 5) whereas the test.csv should be used to evaluate/test your prediction model (Step 6).DELIVERABLESSubmit a softcopy of the project report including the six phases of model building (as mentioned above), the R Code, the reports/charts and any other relevant inputs. You would also be required to present your project after the submission of your assignment.See the marking rubric for more details.Good Luck
Oct 09, 2020INF30030Swinburne University of Technology
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here