SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 4 for this unit. Key...

1 answer below »
I would prefer the same tutor from order90494


SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University 1 SIT720 This document supplies detailed information on Assessment Task 4 for this unit. Key information • Due: Monday 27 September 2021 by 8.00 pm (AEST), • Weighting: 25% Learning Outcomes This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO): Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO) ULO3 - Perform linear regression, classification using logistic regression and linear Support Vector Machines. ULO4 - Perform non-linear classification using KNN and SVM with different kernels. ULO5 - Perform non-linear classification using Decision trees and Random forests. ULO6 - Perform model selection and compute relevant evaluation measure for a given problem. ULO7 - Use concepts of machine learning algorithms to design solution and compare multiple solutions. GLO1 - through the assessment of student ability to apply advanced data processing techniques through programming for prediction. GLO5 - through assessment of student ability to deal with defined data set and solve problems. Purpose Students will be given a specific data set for analysis and will be required to develop and compare various classification techniques. Each student must demonstrate skills acquired in data representation, classification, and evaluation. Assessment 4 Total marks = 30 Submission Instructions a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and explanations including outputs and figures into a separate file and submit as a PDF file. b) Submission other than the above-mentioned file formats will not be assessed and given zero for the entire submission. c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the question i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for better presentation of the code, add question only before the first solution cell. d) Your submitted code should be executable. If your code does not generate the submitted solution, then you will get zero for that part of the marks. e) Answers must be relevant and precise. f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided. g) Use topics covered till week 10 for answering this assignment. h) Submit your assignment after running each cell individually. i) The submitted notebook file name should be of this form “SIT720_A4_studentID.ipynb”. For example, if your student ID is 1234, then the submitted file name should be “SIT720_A4_1234.ipynb”. SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University 2 SIT720 _____________________________________________________________________________________ Questions _____________________________________________________________________________________ 1. What is an ensemble classifier? Name some of the popular ensemble methods (at least three) and which one you prefer and why? (2 marks) 2. Let’s assume we have a noisy dataset. You want to build a classifier model. Which classifier is appropriate for your dataset and why? (2 marks) _____________________________________________________________________________________ Background In the modern world, customer details are very important to suggest any product for buying. Gender, age and education have impact on level of consumption of different products. So, it is essential for businesses to analyse their customer details to better understand consumer behaviour and their impact on various products. Dataset filename: Customer relationship marketing (CRM).csv Dataset description: This dataset includes data on customer details and their response to buy any products. The data contains 20 attributes and 9134 records. Features and labels: The attribute names are listed below. I. State II. Customer Lifetime Value III. Response IV. Coverage V. Education VI. Effective To Date VII. EmploymentStatus VIII. Gender IX. Income X. Location Code XI. Marital Status XII. Monthly Premium Auto XIII. Months Since Last Claim XIV. Number of Open Complaints XV. Number of Policies * Policy XVI. Renew Offer Type XVII. Sales Channel XVIII. Total Claim Amount XIX. Vehicle Class _____________________________________________________________________________________ Questions _____________________________________________________________________________________ 3. Load and pre-process the dataset if necessary. Explain steps that you have taken. Are there any alternative ways for doing that? Explain. (5 marks) SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University 3 SIT720 4. Analyse the importance of the features for predicting customer response using two different approaches. Explain the similarity/difference between outcomes. (5 marks) 5. Create three supervised machine learning (ML) models except any ensemble approach for predicting customer response. (10 Marks) a. Report performance score using a suitable metric. Is it possible that the presented result is an overfitted one? Justify. b. Justify different design decisions for each ML model used to answer this question. c. Have you optimised any hyper-parameters for each ML model? What are they? Why have you done that? Explain. d. Finally, make a recommendation based on the reported results and justify it. 6. Build three ensemble models for predicting customer response. (6 Marks) a. When do you want to use ensemble models over other ML models? b. What are the similarities or differences between these models? c. Is there any preferable scenario for using any specific model among set of ensemble models? d. Write a report comparing performances of models built in question 5 and 6. Report the best method based on model complexity and performance. e. Is it possible to build ensemble model using ML classifiers other than decision tree? If yes, then explain with an example. N. B. This is a HD (High Distinction) level question. Those students who target HD grade should answer this question (including answering all the above questions). For others, this question is an option. This question aims to demonstrate your expertise in the subject area and the ability to do your own research in the related area. Submission details Deakin University has a strict standard on plagiarism as a part of Academic Integrity. To avoid any issues with plagiarism, students are strongly encouraged to run the similarity check with the Turnitin system, which is available through Unistart. A Similarity score MUST NOT exceed 39% in any case. Late submission penalty is 5% per each 24 hours from- Monday 27 September 2021 by 8.00 pm (AEST), No marking on any submission after 5 days (24 hours X 5 days from- Monday 27 September 2021 by 8.00 pm (AEST),). Extension requests Requests for extensions should be made to Unit/Campus Chairs well in advance of the assessment due date. If you wish to seek an extension for an assignment, you will need to submit a request using the “Extension Request” link of the “Assessment” menu in the unit site, as soon as you become aware that you will have difficulty in meeting the scheduled deadline, but at least 3 days before the due date. When you make your request
Answered 15 days AfterSep 17, 2021

Answer To: SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University XXXXXXXXXX1...

Pritam Kumar answered on Sep 18 2021
160 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data Loading"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"<th>Policy\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGenderIncomeLocation CodeMarital StatusMonthly Premium AutoMonths Since Last ClaimNumber of Open ComplaintsNumber of PoliciesRenew Offer TypeSales ChannelTotal Claim AmountVehicle Class
0Washington2763.519279NoBasicBachelor2/24/2011EmployedF56274SuburbanMarried693201Corporate L3Offer1Agent384.811147Two-Door Car
1Arizona6979.535903NoExtendedBachelor1/31/2011UnemployedF0SuburbanSingle941308Personal L3Offer3Agent1131.464935Four-Door Car
2Nevada12887.431650NoPremiumBachelor2/19/2011EmployedF48767SuburbanMarried1081802Personal L3Offer1Agent566.472247Two-Door Car
3California7645.861827NoBasicBachelor1/20/2011UnemployedM0SuburbanMarried1061807Corporate L2Offer1Call Center529.881344SUV
4Washington2813.692575NoBasicBachelor2/3/2011EmployedM43836RuralSingle731201Personal L1Offer1Agent138.130879Four-Door Car
\n",
"
"
],
"text/plain": [
" State Customer Lifetime Value Response Coverage Education \\\n",
"0 Washington 2763.519279 No Basic Bachelor \n",
"1 Arizona 6979.535903 No Extended Bachelor \n",
"2 Nevada 12887.431650 No Premium Bachelor \n",
"3 California 7645.861827 No Basic Bachelor \n",
"4 Washington 2813.692575 No Basic Bachelor \n",
"\n",
" Effective To Date EmploymentStatus Gender Income Location Code \\\n",
"0 2/24/2011 Employed F 56274 Suburban \n",
"1 1/31/2011 Unemployed F 0 Suburban \n",
"2 2/19/2011 Employed F 48767 Suburban \n",
"3 1/20/2011 Unemployed M 0 Suburban \n",
"4 2/3/2011 Employed M 43836 Rural \n",
"\n",
" Marital Status Monthly Premium Auto Months Since Last Claim \\\n",
"0 Married 69 32 \n",
"1 Single 94 13 \n",
"2 Married 108 18 \n",
"3 Married 106 18 \n",
"4 Single 73 12 \n",
"\n",
" Number of Open Complaints Number of Policies Policy \\\n",
"0 0 1 Corporate L3 \n",
"1 0 8 Personal L3 \n",
"2 0 2 Personal L3 \n",
"3 0 7 Corporate L2 \n",
"4 0 1 Personal L1 \n",
"\n",
" Renew Offer Type Sales Channel Total Claim Amount Vehicle Class \n",
"0 Offer1 Agent 384.811147 Two-Door Car \n",
"1 Offer3 Agent 1131.464935 Four-Door Car \n",
"2 Offer1 Agent 566.472247 Two-Door Car \n",
"3 Offer1 Call Center 529.881344 SUV \n",
"4 Offer1 Agent 138.130879 Four-Door Car "
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(\"D:\\\\New\\\\CRM_dataset.csv\")\n",
"data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having a brief look at the data, assessing its variables, we need encoding for the categorical variables for further analysis. This is a classification task with \"Response\" as the y-variable. We will also skip the date variable which cannot contribute anything to the model here."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import preprocessing"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"data2 = pd.DataFrame(data['State'])"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"le = preprocessing.LabelEncoder()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LabelEncoder()"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"le.fit(data2['State'])"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Arizona', 'California', 'Nevada', 'Oregon', 'Washington']"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(le.classes_)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"data2['State'] = le.transform(data2['State'])"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"data2['Customer Lifetime Value'] = data['Customer Lifetime Value']"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"data2['Response'] = data['Response']"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Response'])\n",
"data2['Response'] = le.transform(data2['Response'])"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"data2['Coverage'] = data['Coverage']"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Coverage'])\n",
"data2['Coverage'] = le.transform(data2['Coverage'])"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"data2['Education'] = data['Education']"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Education'])\n",
"data2['Education'] = le.transform(data2['Education'])"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"data2['EmploymentStatus'] = data['EmploymentStatus']"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['EmploymentStatus'])\n",
"data2['EmploymentStatus'] = le.transform(data2['EmploymentStatus'])"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"data2['Gender'] = data['Gender']"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Gender'])\n",
"data2['Gender'] = le.transform(data2['Gender'])"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"data2['Income'] = data['Income']"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"data2['Location Code'] = data['Location Code']"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Location Code'])\n",
"data2['Location Code'] = le.transform(data2['Location Code'])"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"data2['Marital Status'] = data['Marital Status']"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Marital Status'])\n",
"data2['Marital Status'] = le.transform(data2['Marital Status'])"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"data2['Monthly Premium Auto'] = data['Monthly Premium Auto']\n",
"data2['Months Since Last Claim'] = data['Months Since Last Claim']\n",
"data2['Number of Open Complaints'] = data['Number of Open Complaints']\n",
"data2['Number of Policies'] = data['Number of Policies']"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"data2['Policy'] = data['Policy']"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Policy'])\n",
"data2['Policy'] = le.transform(data2['Policy'])"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"data2['Renew Offer Type'] = data['Renew Offer Type']"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Renew Offer Type'])\n",
"data2['Renew Offer Type'] = le.transform(data2['Renew Offer Type'])"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"data2['Sales Channel'] = data['Sales Channel']"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Sales Channel'])\n",
"data2['Sales Channel'] = le.transform(data2['Sales Channel'])"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"data2['Total Claim Amount'] = data['Total Claim Amount']"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"data2['Vehicle Class'] = data['Vehicle Class']"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"le.fit(data2['Vehicle Class'])\n",
"data2['Vehicle Class'] = le.transform(data2['Vehicle Class'])"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
StateCustomer Lifetime ValueResponseCoverage