FileA2 1/4 CSE5DMI Data Mining Assignment 2 ...

Question

FileA2    1/4  CSE5DMI Data Mining                         Assignment 2                           Semester 2, 2020   CSE5DMI 2020 Assignment Two [20 marks]  Assignment Due: 11:00 PM, Monday (in Week 13), 19 Oct 2020 GENERAL DESCRIPTION   This INDIVIDUAL assignment consists of TWO PARTS and is worth 20% of the assessment  of this subject.   PART I (16 MARKS)   In this part, we are going to build a neural network classifier (NN) for the given dataset. You will  be working with a subject/unit dataset taken from a major American university. This University  is selective and therefore attracts a student body with relatively high entrance qualifications. The  dataset has been partially cleaned but still contains blank or null values. You are to build a NN  predictive stream that predicts At_Risk status, which is a logistic/binary/flag classification target.  A detailed list of feature descriptions can be found below. Column Feature Name Feature Description  A GRD_PTS_Per_Unit Grade Points Per Unit  B At_Risk (Class label)  At-Risk of Failure (Classification Target  Variable)  C Catalog_NBR  Catalog Number - *Appears to indicate multiple  offerings of the same subject, possibly a  summer offering and a typical semester  offering.  D GPAO  Grade Point Average in Other Units/Classes  *This is the student’s overall GPA, in all other  subjects, excluding the one captured in this  spreadsheet.  E ANON_INSTR_ID  Anonymous Instructor ID – Indicates the academic  or tutor who teaches the student and is likely to  mark their work.  F TERM  Teaching term in which the subject was  offered/taught  G HSGPA  High School Grade Point Average – The student’s  GPA from high school / secondary education.  H  LAST_ACT_ENGL_SC ORE  Last ACT English Score  I “ “ ACT MATH ACT Mathematics Score  J “ “ ACT READ ACT Reading Score  K “ “ ACT SCIRE ACT Science  Reasoning Score  L “ “ ACT COMP ACT Comprehensive Score  M SEX Student sex/gender  2/4  CSE5DMI Data Mining                         Assignment 2                           Semester 2, 2020   1. Is the original data ready to be used by the Orange3 neural network classifier?  i. If not, state the reason and write a Python script to perform any necessary pre- processing so that the data becomes suitable to be used. [2 marks]   ii. Briefly describe the pre-processing you carried out with a brief comment in your  python script and word document [2 marks]   iii. Submit the pre-processed data in CSV format [1 mark]   2. Create a NN classifier. The following parameters need to be defined by you [2 marks]  o Number of hidden layers: ?   o Number of neurons: ?   o Maximum number of iterations: ?   Use default settings for other parameters. Perform 10-fold cross-validation to evaluate the  performance of the NN classifier with this data. The answer should include the  followings:   i. Python source codes reading the source data, building the learner, and performing  10-fold cross-validation. [2 marks]   ii. Accuracy and area under the receiver operating characteristic (ROC) curve (AUC).  Note: no marks will be given without answer for (i) [1 mark]   iii. What is your inference? Is NN the best classifier for this data? [2 marks] Submit your Python source codes in a single Python script file and the pre-processed  data (CSV format). 3. Use the same dataset and build a NN classifier using PANDAS and SciKitLearn’s MLP  Classifier.  i. Read the data set as a data frame object, pre-process the data and split it into  training and testing dataset. (70% training and 30% testing) [1 mark]  ii. Create an MLP classifier using the following parameters: [1 mark]  o Number of hidden layers: 3   o Number of neurons: 10   o Maximum number of iterations: 6000   iii. Evaluate the model, get predictions, and generate a confusion matrix. Explain it in  your word document [2 marks] Submit your Python source codes in a single Python script file.  3/4  CSE5DMI Data Mining                         Assignment 2                           Semester 2, 2020   PART II [4 MARKS]   In this part, we are going to apply K-means clustering on a set of signal data from a phased array  of 16 high-frequency antennas. This (built-in) Orange3 dataset can be loaded with the Python  statement: data_tab = Table('ionosphere')   The data is then stored in the Orange3 Table object data_tab. and the class label “y” indicates  whether a signal pass through the ionosphere (shown as “b” in y) or present some types of  structure in the ionosphere (shown as “g” in y). All other table columns with their names starting  with “a” are signal readings.   a. Cluster the data using the scikit-learn K-means clustering with K = 2 and specifying  random_state = 0. Submit your Python script file for this process. [2 marks]   b. Use the clustering results obtained from Part II (a) and the class label “y” to count  and fill in the number of signals for each of the four categories in the table below.  Use Python to perform the calculations. [2 marks]   y=“g”  y=“b”   Cluster 0       Cluster 1 Submit your Python source codes for Part II (a) and (b) in a single Python script  file. No marks will be given to your answers unless the relevant source codes are  submitted.   IMPORTANT NOTES  1. A penalty of 5% of the marks per day will be imposed on late submissions of assessment up  to four (4) working days after the due date. An assignment submitted more than FOUR  working days after the due date will NOT be accepted, and ZERO mark will be  assigned.   2. If you would like to seek extensions for submission, please apply for formal Special  Consideration. To do this or find detailed information, please go to  http://www.latrobe.edu.au/special-consideration  3. Academic misconduct includes poor referencing, plagiarism, copying and cheating.  Copying, Plagiarism: Plagiarism is the submission of somebody else’s work in a manner  that gives the impression that the work is your own. Recall that the University takes  academic misconduct very seriously. When it is detected, penalties are strictly imposed.  You should familiarise yourself with your responsibilities about Academic Integrity. Detailed  information can be found here: http://www.latrobe.edu.au/students/learning/academic- integrity  SUBMISSION GUIDELINE  ▪ Submit before 11:00 PM (Australian Eastern Standard Time), Monday, 19 Oct 2020  (Week 13).   http://www.latrobe.edu.au/special-consideration http://www.latrobe.edu.au/students/learning/academic-integrity http://www.latrobe.edu.au/students/learning/academic-integrity    4/4  CSE5DMI Data Mining                         Assignment 2                           Semester 2, 2020   ▪ Upload a single .zip archive onto LMS before the deadline. The .zip archive needs to be  named with your SID, e.g. if your SID is “12345678”, then the archive must be called  “12345678.zip”. It should contain   o A document (word or PDF) of your answers to Part I and Part II. The document  needs to be named with your SID, e.g. if your SID is “12345678”, then name the  document as “12345678_report.pdf” or “12345678_report.docx” or  “12345678_report.doc”  - Explain each step and show your results for all the questions in the document  o Python source code to support your answers  o Pre-processed CSV file.   ▪ Assignment submitted without Python source code will not be evaluated.  ▪ Late submissions will incur a penalty of 5% of the marks per day. END

Vicky · Accepted Answer

Answer Attached Below:

A2 1/4 CSE5DMI Data Mining XXXXXXXXXXAssignment 2 XXXXXXXXXXSemester 2, 2020 CSE5DMI 2020 Assignment Two [20 marks] Assignment Due: 11:00 PM, Monday (in Week 13), 19 Oct 2020 GENERAL DESCRIPTION This...

Answer To: A2 1/4 CSE5DMI Data Mining XXXXXXXXXXAssignment 2 XXXXXXXXXXSemester 2, 2020 CSE5DMI 2020 Assignment...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment