Late submissions will be penalised at the rate of 10 marks per day late or part thereof. You must keep a copy of the final version of your assignment as submitted and be prepared to provide it on...



Late submissions will be penalised at the rate of 10 marks per day late or part thereof.
You
must
keep a copy of the final version of your assignment as submitted and be prepared to provide it on request.
The University treats plagiarism, collusion, theft of other students’ work and other forms of dishonesty in assessment seriously. Any instances of dishonesty in this assessment will be forwarded immediately to the Faculty Dean. For guidelines on honesty in assessment including avoiding plagiarism, see: http://our.murdoch.edu.au/Educational-technologies/Academic-integrity/
Overview

For this assignment, students will work in pairs. Each group needs to choose a real dataset that the group members find interesting, in the sense that they believe it contains data which can provide useful information if explored. Students then need to implement, via the R programming language, different techniques that we have covered in this unit to try to find the best way to answer their questions about the dataset and extract the useful information.
There are numerous datasets available online, and a link to a good repository will been given in LMS during the semester. You are free, however, to choose any data set you prefer, the conditions being that


  1. The dataset must be freely available online so that I can download it and perform the analysis myself.

  2. Students must each choose unique projects – this generally means different datasets entirely.



If you have another preferred source of data then you may request to use that instead and I’ll have a look. I can also propose other datasets, if students need additional choices. Having decided on a dataset you should then post up your plans on the discussion forum for other students to view and comment. This discussion is assessed.
Your results, after using on the dataset the techniques you have learned in this unit, should then be described and explained to the reader. The report does not require lengthy text sections and much of the content may be results of analysis and/or graphs or plots as required.
In conjunction with the submission of the report, students will also present an overview of the findings, as explained below.

Deliverables:




  1. Online Discussion forum (5 marks):
    Post your proposed topic and chosen dataset as well as a short plan for the project. Explain if it falls into the supervised or unsupervised learning category and if it is a regression or classification problem. The above is required for approval of the topic. As discussed, students must select unique topics, therefore if any assignments overlap they will not be accepted. This should be done by the end of week 10. Also any queries about the assignment deliverables should be made in the discussion forum so that other students can also benefit from the responses.



2.
Oral Presentation (15 marks):
You will be required to present a brief (10) minute executive summary of your project in class. This is a mandatory component of the assignment.



  1. Data Mining technical report (80 marks):
    The marks for the report section are split into three areas:


a. Data understanding and preparation (20%)
b. Algorithms/techniques chosen and implemented in the R programming language for data analysis (30%)

  1. Presentation,discussion and quality of the results – explanation of interesting patterns found (50%)




Notes
:
All work must be submitted in ONE word document
No Email submissions allowed unless specific permission has been granted
Do not explain how to perform the techniques or provide instructions in your report, this is what the books are for. Instead spend your time explaining your findings.
May 08, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here