Online Discussion forum (5 marks): Post your proposed topic and chosen dataset as well as a short plan for the project. Explain if it falls into the supervised or unsupervised learning category and if...

1 answer below »



  1. Online Discussion forum (5 marks):
    Post your proposed topic and chosen dataset as well as a short plan for the project. Explain if it falls into the supervised or unsupervised learning category and if it is a regression or classification problem. The above is required for approval of the topic. As discussed, students must select unique topics, therefore if any assignments overlap they will not be accepted. This should be done by the end of week 10. Also any queries about the assignment deliverables should be made in the discussion forum so that other students can also benefit from the responses.



  2. Oral Presentation (15 marks):
    You will be required to present a brief (10) minute executive summary of your project in class. This is a mandatory component of the assignment.



  3. Data Mining technical report (80 marks):
    The marks for the report section are split into three areas:


a.Data understanding and preparation (20%)
b. Algorithms/techniques chosen and implemented in the R programming language for data analysis (30%)

  1. Presentation,discussion and quality of the results – explanation of interesting patterns found (50%)




Notes
:
All work must be submitted in ONE word document
No Email submissions allowed unless specific permission has been granted
Do not explain how to perform the techniques or provide instructions in your report, this is what the books are for. Instead spend your time explaining your findings.



Document Preview:

MURDOCH UNIVERSITY ICT515 Foundations of Data Science Semester 1, 2017 ASSIGNMENT 2 Assignment Information For this assignment, students should work in pairs. You should submit your assignment from the ICT515 LMS site using the Assignment unit tool. Late submissions will be penalised at the rate of 10 marks per day late or part thereof. You must keep a copy of the final version of your assignment as submitted and be prepared to provide it on request. The University treats plagiarism, collusion, theft of other students’ work and other forms of dishonesty in assessment seriously. Any instances of dishonesty in this assessment will be forwarded immediately to the Faculty Dean. For guidelines on honesty in assessment including avoiding plagiarism, see:  HYPERLINK "http://our.murdoch.edu.au/Educational-technologies/Academic-integrity/" http://our.murdoch.edu.au/Educational-technologies/Academic-integrity/ Overview For this assignment, students will work in pairs. Each group needs to choose a real dataset (two different dataset) that the group members find interesting, in the sense that they believe it contains data which can provide useful information if explored. Students then need to implement, via the R programming language, different techniques that we have covered in this unit to try to find the best way to answer their questions about the dataset and extract the useful information. There are numerous datasets available online, and a link to a good repository will been given in LMS during the semester. You are free, however, to choose any data set you prefer, the conditions being that The dataset must be freely available online so that I can download it and perform the analysis myself. Students must each choose unique projects – this generally means different datasets entirely. If you have another preferred source of data then you may request to use that instead and I’ll have a look. I can also propose other datasets, if students need additional...



Answered Same DayDec 26, 2021

Answer To: Online Discussion forum (5 marks): Post your proposed topic and chosen dataset as well as a short...

Robert answered on Dec 26 2021
127 Votes
Data Mining technical Report

PART A: SUPERVISED CLASSIFICATION (using random
forest)
a) Data understanding and Preparat
ion
The multivariate data is related with a Portuguese banking institution which is
related with direct marketing campaigns. The data consists of 45,211
observations from May’08-Nov’10. These are the records of phone calls and are
not unique as per the client, to record if the term deposit was subscribed for or
not. In the phone campaign,17 categorical and non-categorical were collected
from the client. The objective is to predict if the client will subscribe for a term
loan or not.
Of the 17 input variables, 16 are independent variables and 1 of them is a
dependent variable. The 17 independent variables are listed below:
Numercial Categorical
Age Job(12)
balance marital(4)
day education(8)
duration default(3)
campaign housing(3)
pdays loan(3)
duration contact(2)
previous month(12)
poutcome(3)
y(2)
The number against the categorical variable is the number of different attributes
that category has.
To perform the modelling activity, the data is split into train and test samples.
75% of the data is randomly chosen for training the model and 25% of the data
is chosen for testing
For...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here