Here, the students have to submit a report of the data mining process on a real-world scenario and a presentation and QA Session will be held based on the report written. The report will consist of...

Here, the students have to submit a report of the data mining process on a real-world scenario and a presentation and QA Session will be held based on the report written. The report will consist of the details of every step followed by thestudents.
Introduction
• Importance of the chosen area• Why this data set is interesting• What has been done so far• Which can be done• Description of the present experiment
1. Data preparation and Feature extraction:
1.1 Select datao Task Select data
1.2 Clean datao Task Clean datao Output Data cleaning report
1.3 Construct data/ feature extractiono Task Construct datao Output Derived attributeso Activities: Derived attributeso Add new attributes to the accessed datao Activities Single-attribute transformationso Output Generated records
2 Modeling
2.1 Select modeling techniqueo Task – Select Modelling Technique
2.2 Output Modeling techniqueo Record the actual modeling technique that is used.
2.3 Output Modeling assumptiono Activities Define any built-in assumptions made by the technique aboutthe data (e.g. quality, format, distribution). Compare these assumptionswith those in the Data Description Report. Make sure that theseassumptions hold and step back to the Data Preparation Phase ifnecessary. You can explain the data file here, even when it is preprepared.
3 Generate test design
3.1 Task Generate test designo Activities Check existing test designs for each data mining goalseparately. Decide on necessary steps (number of iterations, number offolds etc.). Prepare data required for test. (You can use 66% of recordsfor model Building and rest for Testing)
3.2 Build modelo Task - Build modelRun the modeling tool on the prepared dataset to create one or moremodels. (Using Knime Tool as shown in the lab).
3.3 Output Parameter settingso Activities - Set initial parameters. Document reasons for choosing thosevalues.o Activities - Run the selected technique on the input dataset to producethe model. Post-process data mining results (e.g. editing rules, displaytrees).
3.4 Output Model descriptiono Activities - Describe any characteristics of the current model that maybe useful for the future. Give a detailed description of the model andany special features.o Activities - State conclusions regarding patterns in the data (if any); sometimes the model reveals important facts about the data without a separate Assessment process (e.g. that the output or conclusion is duplicated in one of the inputs).
4 Evaluation and Conclusion
Previous evaluation steps dealt with factors such as the accuracy and generality of the model. This step assesses the degree to which the model meets the businessobjectives and seeks to determine if there is some business reason why this model isdeficient. It compares results with the evaluation criteria defined at the start of theproject. A good way of defining the total outputs of a data mining project is to use theequation:
RESULTS = MODELS + FINDINGS
In this equation we are defining that the total output of the data mining project is notjust the models (although they are, of course, important) but also findings which wedefine as anything (apart from the model) that is important in meeting objectives of the business (or important in leading to new questions, line of approach or side effects (e.g. data quality problems uncovered by the data mining exercise).Note: although the model is directly connected to the business questions, the findings need not be related.
Sep 23, 2021BISY3001
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here