Task
back to topFor this assessment, you are required to useWeka 3.6(or a later version), you will then use throughout the duration of this subject.
Task 1: Weka data exploration [5 marks]
In Weka workbench (workbench option for 3.8 or explorer option for older versions), loaddiabetes.arffdataset to answer the following questions.
(a)How many instances and attributes (including the class attribute) does this dataset have?[1 mark]
(b)How many classes are present in the dataset and how many instances are there for each class ?[2 marks]
(c)Use histograms (with default settings) to show which age group has the highest number of samples? [2 marks]
Task 2:Working with a new data file in Weka [5 marks]
This task needs you to make your own .arff file for the following dataset and explore features.
a)Open theiris.arfffile from~/weka/data/folder in a text editor, then remove‘petal_width’attribute and save it asiris.3D.arff. Please make sure that the Attribute-Relation File Format (.arff) iscorrectly preserved. [4 marks]
Hints:
You may use any of the*.arfffiles as a template for this conversion. There can be several*.arfffiles found in ~/weka/data/ folder in your distribution.
b)Load this file in workbench and include a screenshot of the histograms (with default setting) for each attribute in this dataset. [1 marks]
Hints:
After loading the file in workbench, you may use ‘visualize all’ button to generate histograms for each attribute.
Task 3:Visual analysis [5 marks]
a)Load the file (iris.3D.arff) that you have created in the previous task in workbench and generate a scatter plot using the‘visualize’menu option to show data distribution for each two attributes in a two-dimensional visualisation. [2 marks]
b)Visually compare the plots for (sepal_length, sepal_width) and (sepal_length, petal_length) and comment on which one of them shows a better class separability in this dataset. Justify your answer with screenshots. [3 marks]
Rationale
back to topThis assessment task will assess the following learning outcome/s:
- be able to identify and analyse business requirements for the identification of patterns and trends in data sets.
- be able to appraise the different approaches and categories of data mining problems.
- be able to compare and evaluate output patterns.
- be able to explore and critically analyse data sets and evaluate their data quality, integrity and security requirements.
- be able to compare and evaluate appropriate techniques for detecting and evaluating patterns in a given data set.
- be able to identify and evaluate the security, privacy and ethical implications in data mining.
Marking criteria and standards
back to top
Criteria |
|
DI (>=75%) |
CR (>=65%) |
PS (>=50%) |
FL (
|
---|
Task 1: Weka data exploration [5 marks]
|
The answers are correct and complete, demonstrating thorough and comprehensive understanding of the specified dataset and the usage of WEKA, and insightful observations.
|
The answers are correct and complete, demonstrating good understanding of the specified dataset and the usage of WEKA, and insightful observations.
|
The answers are correct and complete, demonstrating understanding of the specified dataset and the usage of WEKA and some insightful observations.
|
The answers are correct, demonstrating understanding of the specified dataset and the usage of WEKA and some observations.
|
Answers are incorrect/in complete or partially complete.
|
Task 2:
Making a data file for Weka
[5 marks]
|
The file is correctly formatted and can be loaded in WEKA and included comprehensive demonstration of visual analysis.
|
The file is correctly formatted and can be loaded in WEKA and included a good demonstration of visual analysis.
|
The file is correctly formatted and can be loaded in WEKA and included some demonstration of visual analysis.
|
The file is correctly formatted and can be loaded in WEKA and included visual analysis.
|
The file is incorrectly formatted and cannot be loaded in WEKA.
|
Task 3:
Visual Analysis
[5 marks]
|
The answers demonstrated comprehensive visual analysis and insightful observations in the specified dataset.
|
The answers demonstrated good visual analysis and insightful observations in the specified dataset.
|
The answers demonstrated some visual analysis and insightful observations in the specified dataset.
|
The answers demonstrated minimal visual analysis and observations in the specified dataset.
|
Answers are incorrect/in complete or partially complete.
|
Presentation
back to top
You are recommended to write the answers in a word document and submit it in either Word format (.doc, or .docx) or .pdf format.
All diagrams that are required should be inserted into the document in appropriate positions with descriptive titles. Your answers to the questions should be precise but complete and informative.