A2 1/4 CSE5DMI Data Mining XXXXXXXXXXAssignment 2 XXXXXXXXXXSemester 2, 2020 CSE5DMI 2020 Assignment Two [20 marks] Assignment Due: 11:00 PM, Monday (in Week 13), 19 Oct 2020 GENERAL DESCRIPTION This...

1 answer below »
File


A2 1/4 CSE5DMI Data Mining Assignment 2 Semester 2, 2020 CSE5DMI 2020 Assignment Two [20 marks] Assignment Due: 11:00 PM, Monday (in Week 13), 19 Oct 2020 GENERAL DESCRIPTION This INDIVIDUAL assignment consists of TWO PARTS and is worth 20% of the assessment of this subject. PART I (16 MARKS) In this part, we are going to build a neural network classifier (NN) for the given dataset. You will be working with a subject/unit dataset taken from a major American university. This University is selective and therefore attracts a student body with relatively high entrance qualifications. The dataset has been partially cleaned but still contains blank or null values. You are to build a NN predictive stream that predicts At_Risk status, which is a logistic/binary/flag classification target. A detailed list of feature descriptions can be found below. Column Feature Name Feature Description A GRD_PTS_Per_Unit Grade Points Per Unit B At_Risk (Class label) At-Risk of Failure (Classification Target Variable) C Catalog_NBR Catalog Number - *Appears to indicate multiple offerings of the same subject, possibly a summer offering and a typical semester offering. D GPAO Grade Point Average in Other Units/Classes *This is the student’s overall GPA, in all other subjects, excluding the one captured in this spreadsheet. E ANON_INSTR_ID Anonymous Instructor ID – Indicates the academic or tutor who teaches the student and is likely to mark their work. F TERM Teaching term in which the subject was offered/taught G HSGPA High School Grade Point Average – The student’s GPA from high school / secondary education. H LAST_ACT_ENGL_SC ORE Last ACT English Score I “ “ ACT MATH ACT Mathematics Score J “ “ ACT READ ACT Reading Score K “ “ ACT SCIRE ACT Science Reasoning Score L “ “ ACT COMP ACT Comprehensive Score M SEX Student sex/gender 2/4 CSE5DMI Data Mining Assignment 2 Semester 2, 2020 1. Is the original data ready to be used by the Orange3 neural network classifier? i. If not, state the reason and write a Python script to perform any necessary pre- processing so that the data becomes suitable to be used. [2 marks] ii. Briefly describe the pre-processing you carried out with a brief comment in your python script and word document [2 marks] iii. Submit the pre-processed data in CSV format [1 mark] 2. Create a NN classifier. The following parameters need to be defined by you [2 marks] o Number of hidden layers: ? o Number of neurons: ? o Maximum number of iterations: ? Use default settings for other parameters. Perform 10-fold cross-validation to evaluate the performance of the NN classifier with this data. The answer should include the followings: i. Python source codes reading the source data, building the learner, and performing 10-fold cross-validation. [2 marks] ii. Accuracy and area under the receiver operating characteristic (ROC) curve (AUC). Note: no marks will be given without answer for (i) [1 mark] iii. What is your inference? Is NN the best classifier for this data? [2 marks] Submit your Python source codes in a single Python script file and the pre-processed data (CSV format). 3. Use the same dataset and build a NN classifier using PANDAS and SciKitLearn’s MLP Classifier. i. Read the data set as a data frame object, pre-process the data and split it into training and testing dataset. (70% training and 30% testing) [1 mark] ii. Create an MLP classifier using the following parameters: [1 mark] o Number of hidden layers: 3 o Number of neurons: 10 o Maximum number of iterations: 6000 iii. Evaluate the model, get predictions, and generate a confusion matrix. Explain it in your word document [2 marks] Submit your Python source codes in a single Python script file. 3/4 CSE5DMI Data Mining Assignment 2 Semester 2, 2020 PART II [4 MARKS] In this part, we are going to apply K-means clustering on a set of signal data from a phased array of 16 high-frequency antennas. This (built-in) Orange3 dataset can be loaded with the Python statement: data_tab = Table('ionosphere') The data is then stored in the Orange3 Table object data_tab. and the class label “y” indicates whether a signal pass through the ionosphere (shown as “b” in y) or present some types of structure in the ionosphere (shown as “g” in y). All other table columns with their names starting with “a” are signal readings. a. Cluster the data using the scikit-learn K-means clustering with K = 2 and specifying random_state = 0. Submit your Python script file for this process. [2 marks] b. Use the clustering results obtained from Part II (a) and the class label “y” to count and fill in the number of signals for each of the four categories in the table below. Use Python to perform the calculations. [2 marks] y=“g” y=“b” Cluster 0 Cluster 1 Submit your Python source codes for Part II (a) and (b) in a single Python script file. No marks will be given to your answers unless the relevant source codes are submitted. IMPORTANT NOTES 1. A penalty of 5% of the marks per day will be imposed on late submissions of assessment up to four (4) working days after the due date. An assignment submitted more than FOUR working days after the due date will NOT be accepted, and ZERO mark will be assigned. 2. If you would like to seek extensions for submission, please apply for formal Special Consideration. To do this or find detailed information, please go to http://www.latrobe.edu.au/special-consideration 3. Academic misconduct includes poor referencing, plagiarism, copying and cheating. Copying, Plagiarism: Plagiarism is the submission of somebody else’s work in a manner that gives the impression that the work is your own. Recall that the University takes academic misconduct very seriously. When it is detected, penalties are strictly imposed. You should familiarise yourself with your responsibilities about Academic Integrity. Detailed information can be found here: http://www.latrobe.edu.au/students/learning/academic- integrity SUBMISSION GUIDELINE ▪ Submit before 11:00 PM (Australian Eastern Standard Time), Monday, 19 Oct 2020 (Week 13). http://www.latrobe.edu.au/special-consideration http://www.latrobe.edu.au/students/learning/academic-integrity http://www.latrobe.edu.au/students/learning/academic-integrity 4/4 CSE5DMI Data Mining Assignment 2 Semester 2, 2020 ▪ Upload a single .zip archive onto LMS before the deadline. The .zip archive needs to be named with your SID, e.g. if your SID is “12345678”, then the archive must be called “12345678.zip”. It should contain o A document (word or PDF) of your answers to Part I and Part II. The document needs to be named with your SID, e.g. if your SID is “12345678”, then name the document as “12345678_report.pdf” or “12345678_report.docx” or “12345678_report.doc” - Explain each step and show your results for all the questions in the document o Python source code to support your answers o Pre-processed CSV file. ▪ Assignment submitted without Python source code will not be evaluated. ▪ Late submissions will incur a penalty of 5% of the marks per day. END
Answered Same DayOct 18, 2021CSE5DMILa Trobe University

Answer To: A2 1/4 CSE5DMI Data Mining XXXXXXXXXXAssignment 2 XXXXXXXXXXSemester 2, 2020 CSE5DMI 2020 Assignment...

Vicky answered on Oct 19 2021
156 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here