Machine learningSIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin...

Question

Machine learningSIT720 Machine Learning   Assessment Task 4: Problem solving task.    ©Deakin University                                                                  1                                                                   SIT720  This document supplies detailed information on Assessment Task 4 for this unit. Key information   • Due: Monday 27 September 2021 by 8.00 pm (AEST),  • Weighting: 25% Learning Outcomes   This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning  Outcomes (GLO):   Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO)    ULO3 - Perform linear regression, classification  using logistic regression and linear Support Vector  Machines.  ULO4 - Perform non-linear classification using KNN  and SVM with different kernels.  ULO5 - Perform non-linear classification using  Decision trees and Random forests.  ULO6 - Perform model selection and compute  relevant evaluation measure for a given problem.  ULO7 - Use concepts of machine learning algorithms  to design solution and compare multiple solutions.  GLO1 - through the assessment of student ability to  apply advanced data processing techniques through  programming for prediction.  GLO5 - through assessment of student ability to deal  with defined data set and solve problems. Purpose   Students will be given a specific data set for analysis and will be required to develop and compare various  classification techniques. Each student must demonstrate skills acquired in data representation, classification,  and evaluation.    Assessment 4                                                                                         Total  marks = 30     Submission Instructions   a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and  explanations including outputs and figures into a separate file and submit as a PDF file.  b) Submission other than the above-mentioned file formats will not be assessed and given zero for the  entire submission.  c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the question  i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for better  presentation of the code, add question only before the first solution cell.  d) Your submitted code should be executable. If your code does not generate the submitted solution,  then you will get zero for that part of the marks.   e) Answers must be relevant and precise.   f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.  g) Use topics covered till week 10 for answering this assignment.   h) Submit your assignment after running each cell individually.  i) The submitted notebook file name should be of this form “SIT720_A4_studentID.ipynb”. For example, if  your  student ID is 1234, then the submitted file name should be “SIT720_A4_1234.ipynb”.                  SIT720 Machine Learning   Assessment Task 4: Problem solving task.    ©Deakin University                                                                  2                                                                   SIT720  _____________________________________________________________________________________  Questions  _____________________________________________________________________________________    1. What is an ensemble classifier? Name some of the popular ensemble methods (at least three) and which  one you prefer and why?                   (2 marks) 2. Let’s assume we have a noisy dataset. You want to build a classifier model. Which classifier is appropriate  for your dataset and why?                  (2 marks)  _____________________________________________________________________________________    Background  In the modern world, customer details are very important to suggest any product for buying. Gender, age and  education have impact on level of consumption of different products. So, it is essential for businesses to  analyse their customer details to better understand consumer behaviour and their impact on various products. Dataset filename: Customer relationship marketing (CRM).csv Dataset description: This dataset includes data on customer details and their response to buy any products.  The data contains 20 attributes and 9134 records. Features and labels: The attribute names are listed below.   I. State   II. Customer Lifetime Value   III. Response   IV. Coverage   V. Education   VI. Effective To Date   VII. EmploymentStatus   VIII. Gender   IX. Income   X. Location Code   XI. Marital Status   XII. Monthly Premium Auto   XIII. Months Since Last Claim   XIV. Number of Open Complaints   XV. Number of Policies * Policy   XVI. Renew Offer Type   XVII. Sales Channel   XVIII. Total Claim Amount   XIX. Vehicle Class    _____________________________________________________________________________________  Questions  _____________________________________________________________________________________ 3. Load and pre-process the dataset if necessary. Explain steps that you have taken. Are there any  alternative ways for doing that? Explain.                 (5 marks)   SIT720 Machine Learning   Assessment Task 4: Problem solving task.    ©Deakin University                                                                  3                                                                   SIT720  4. Analyse the importance of the features for predicting customer response using two different approaches.  Explain the similarity/difference between outcomes.               (5 marks) 5. Create three supervised machine learning (ML) models except any ensemble approach for predicting  customer response.                                 (10 Marks) a. Report performance score using a suitable metric. Is it possible that the presented result is an  overfitted one? Justify.    b. Justify different design decisions for each ML model used to answer this question.  c. Have you optimised any hyper-parameters for each ML model? What are they? Why have you  done that? Explain.   d. Finally, make a recommendation based on the reported results and justify it. 6. Build three ensemble models for predicting customer response.             (6 Marks)   a. When do you want to use ensemble models over other ML models?   b. What are the similarities or differences between these models?  c. Is there any preferable scenario for using any specific model among set of ensemble models?  d. Write a report comparing performances of models built in question 5 and 6. Report the best  method based on model complexity and performance.   e. Is it possible to build ensemble model using ML classifiers other than decision tree? If yes, then  explain with an example. N. B. This is a HD (High Distinction) level question. Those students who target HD grade  should answer this question (including answering all the above questions). For others, this  question is an option. This question aims to demonstrate your expertise in the subject area  and the ability to do your own research in the related area. Submission details   Deakin University has a strict standard on plagiarism as a part of Academic Integrity. To avoid any issues with  plagiarism, students are strongly encouraged to run the similarity check with the Turnitin system, which is  available through Unistart. A Similarity score MUST NOT exceed 39% in any case. Late submission penalty is  5% per each 24 hours from- Monday 27 September 2021 by 8.00 pm (AEST), No marking on any submission  after 5 days (24 hours X 5 days from- Monday 27 September 2021 by 8.00 pm (AEST),). Extension requests   Requests for extensions should be made to Unit/Campus Chairs well in advance of the assessment due date.  If you wish to seek an extension for an assignment, you will need to submit a request using the “Extension  Request” link of the “Assessment” menu in the unit site, as soon as you become aware that you will have  difficulty in meeting the scheduled deadline, but at least 3 days before the due date. When you make your  request

Pritam Kumar · Accepted Answer

Questions 1, 2, 3, and 4
What is an ensemble classifier? Name some of the popular ensemble methods (at least three) and which one you prefer and why?
An ensemble classifier is an ML technique for improving the accuracy of results in the classifier models. It is done by combining multiple models instead of using a single model. The combined models increase the accuracy of the results significantly. Bagging, boosting and random forest are 3 popular ensemble classifiers. I like all three of them because they usually provide better accuracies among some of the machine learning classifiers.
Let’s assume we have a noisy dataset. You want to build a classifier model. Which classifier is appropriate for your dataset and why?

SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 4 for this unit. Key...

Answer To: SIT720 Machine Learning Assessment Task 4: Problem solving task. ©Deakin University XXXXXXXXXX1...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment