Project Title: Comparing classification techniques to identify the best classifier
In this project, you will be working with the attached"bank.csv"to compare different classification models. The description of the data file is given in the"DatasetDescription.txt"file. So, please read the file carefully and understand the dataset. The target class in the file is the 'y' attribute. Here, the problem is to predict whether the client subscribes to a term deposit or not.
You need to run an experiment to train and test the learning model using the following four classifiers:
1.CTree
2.J48 Tree
3.Linear Classifier
4.K Nearest Neighbour (k-NN)
After the experiment is complete, you need to compare the results to identify the best classifier for the given dataset.
Please upload the following to D2L by Tuesday 27 July 2021:
1.Project Reportas anMS WORDfile. (Please follow the template given on pages 2-6 of this document andpreciselyfollow the template)
2..R files
3.Presentation Slidesas aPDFfile.(You can have up to 10-12 slides. During the presentation, you want to describe the methodology, dataset, and experiment results. The duration of the presentation will be15minutes)
Turnitin will be enabled for this project.
Project Title: Comparing classification techniques to identify the best classifier
Abstract:Here, using no more than 160 words, provide a summary of your work, and it should include your findings as well.
1.
Introduction
Here, you will describe the problem in detail and write short descriptions about the four classifiers:
CTree:
J48 Tree
Linear Classifier
K Nearest Neighbour (k-NN)
2.
Methodology
Here, you describe the steps that you have followed to complete the experiment. Steps should include the basic steps of data analytics, such as data exploration, preprocessing the data, applying the learning model, and testing the model.
3.
Experiment Design
As an introduction, here, you write about the experiment design, i.e., you want to write about the plan to run the experiment
3.1
Description of the given dataset
Here, you describe the given dataset. At first, you write about the dimension of the dataset, and you can include the mean, median, etc., for different variables. The dataset has lots of 'unknown' entries, include the statistics of unknown entries, i.e., how many rows have the 'unknown' entries, the percentage of the 'unknown' entries.
3.2 Preprocessing the data
Since the dataset has lots of 'unknown' entries, you need to clean them.Start the preprocessing step by removing the column' poutcome'.Now, remove all the rows that have the 'unknown' entries. After proper cleaning, you should have around 31000 rows left to run the experiment.
So, here write how you have cleaned the data. Now, describe the cleaned dataset. Please follow the instructions given in 3.1 for what to include here.
3.3 Preparing the Training and Test Datasets
Here describe what you have done to prepare the training and test datasets from thecleaneddataset. Mention how many rows you have for training and how many rows you have for testing the model.Tip: you can use an 80-20 split to prepare the datasets.
4.
Results
Here, provide an introduction on how you will present the results. You want to say to perform the comparison of the classifiers what metrics you will use.
4.1 CTree Classification
Insert the following:
·Confusion matrix created using R along with a description of the confusion matrix
·Use the confusion matrix to calculate the following:
üAccuracy
üError Rate
üPrecision
üRecall
üF1-Score
·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.
4.2 J48 Classification
Insert the following:
·Confusion matrix created using R along with a description of the confusion matrix
·Use the confusion matrix to calculate the following:
üAccuracy
üError Rate
üPrecision
üRecall
üF1-Score
·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.
4.3 Linear Classification
To design the linear model, you can use 1.5 as the threshold.
Insert the following:
·Confusion matrix created using R along with a description of the confusion matrix
·Use the confusion matrix to calculate the following:
üAccuracy
üError Rate
üPrecision
üRecall
üF1-Score
·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.
4.4 k-NN Classification
Insert the following:
·Confusion matrix created using R along with a description of the confusion matrix
·Use the confusion matrix to calculate the following:
üAccuracy
üError Rate
üPrecision
üRecall
üF1-Score
·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.
5.
Discussion on the Results
Here, you want to discuss the results. Please discuss your results based on the following:
5.1
Classifier Comparison Based on Accuracy
Here, add a bar chart that will show the four classifiers' accuracy and discuss your results. In the discussion, you can rank the classifier based on accuracy, mention the percentage difference among various classifiers.To plot the bar chart, you have to use R
5.2
Classifier Comparison Based on Error Rate
Please follow the instructions given in 5.1
5.3
Classifier Comparison Based on Precision
Please follow the instructions given in 5.1
5.4
Classifier Comparison Based on Recall
Please follow the instructions given in 5.1
5.5
Classifier Comparison Based on F1-Score
Please follow the instructions given in 5.1
Add a conclusive discussion here in a new paragraph. Here, in terms of 5.1 to 5.5, you want to say which classifier is the best for the given dataset. You want to rank the classifier here.
6.
Conclusion
You want to write down your conclusive statement here.
APPENDIX
PLEASE COPY AND PASTE ALL R-Code here:
Code to get the statistics of the given dataset
//Add the related R-Code here
Code to Data Cleaning
//Add the related R-Code here
Code to get the statistics of the cleaned dataset
//Add the related R-Code here
Code to Prepare the Training and the Test Dataset
//Add the related R-Code here
Code to CTree classification and to generate Confusion Matrix for CTree
//Add the related R-Code here
Code to J48 classification and to generate Confusion Matrix for J48
//Add the related R-Code here
Code to Linear classification and to generate Confusion Matrix for Linear Classifier
//Add the related R-Code here
Code to KNN classification and to generate Confusion Matrix for KNN
//Add the related R-Code here
Code to Generate the Bar charts of 5.1 to 5.5
//Add the related R-Code here
Please follow this template to complete the project. While grading the project, I will be strictly following this template. So, please don't miss any steps.