Project Title: Comparing classification techniques to identify the best classifierIn this project,...

Question

Project Title: Comparing classification techniques to identify the best classifier In this project, you will be working with the attached "bank.csv" to compare different classification models. The...

Project Title: Comparing classification techniques to identify the best classifier

In this project, you will be working with the attached"bank.csv"to compare different classification models. The description of the data file is given in the"DatasetDescription.txt"file. So, please read the file carefully and understand the dataset. The target class in the file is the 'y' attribute. Here, the problem is to predict whether the client subscribes to a term deposit or not.

You need to run an experiment to train and test the learning model using the following four classifiers:

1.CTree

2.J48 Tree

3.Linear Classifier

4.K Nearest Neighbour (k-NN)

After the experiment is complete, you need to compare the results to identify the best classifier for the given dataset.

Please upload the following to D2L by Tuesday 27 July 2021:

1.Project Reportas anMS WORDfile. (Please follow the template given on pages 2-6 of this document andpreciselyfollow the template)

2..R files

3.Presentation Slidesas aPDFfile.(You can have up to 10-12 slides. During the presentation, you want to describe the methodology, dataset, and experiment results. The duration of the presentation will be15minutes)

Turnitin will be enabled for this project.

Project Title: Comparing classification techniques to identify the best classifier

Abstract:Here, using no more than 160 words, provide a summary of your work, and it should include your findings as well.

1.
Introduction

Here, you will describe the problem in detail and write short descriptions about the four classifiers:

CTree:

J48 Tree

Linear Classifier

K Nearest Neighbour (k-NN)

2.
Methodology

Here, you describe the steps that you have followed to complete the experiment. Steps should include the basic steps of data analytics, such as data exploration, preprocessing the data, applying the learning model, and testing the model.

3.
Experiment Design

As an introduction, here, you write about the experiment design, i.e., you want to write about the plan to run the experiment

3.1
Description of the given dataset

Here, you describe the given dataset. At first, you write about the dimension of the dataset, and you can include the mean, median, etc., for different variables. The dataset has lots of 'unknown' entries, include the statistics of unknown entries, i.e., how many rows have the 'unknown' entries, the percentage of the 'unknown' entries.

3.2 Preprocessing the data

Since the dataset has lots of 'unknown' entries, you need to clean them.Start the preprocessing step by removing the column' poutcome'.Now, remove all the rows that have the 'unknown' entries. After proper cleaning, you should have around 31000 rows left to run the experiment.

So, here write how you have cleaned the data. Now, describe the cleaned dataset. Please follow the instructions given in 3.1 for what to include here.

3.3 Preparing the Training and Test Datasets

Here describe what you have done to prepare the training and test datasets from thecleaneddataset. Mention how many rows you have for training and how many rows you have for testing the model.Tip: you can use an 80-20 split to prepare the datasets.

4.
Results

Here, provide an introduction on how you will present the results. You want to say to perform the comparison of the classifiers what metrics you will use.

4.1 CTree Classification

Insert the following:

·Confusion matrix created using R along with a description of the confusion matrix

·Use the confusion matrix to calculate the following:

üAccuracy

üError Rate

üPrecision

üRecall

üF1-Score

·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.

4.2 J48 Classification

Insert the following:

·Confusion matrix created using R along with a description of the confusion matrix

·Use the confusion matrix to calculate the following:

üAccuracy

üError Rate

üPrecision

üRecall

üF1-Score

·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.

4.3 Linear Classification

To design the linear model, you can use 1.5 as the threshold.

Insert the following:

·Confusion matrix created using R along with a description of the confusion matrix

·Use the confusion matrix to calculate the following:

üAccuracy

üError Rate

üPrecision

üRecall

üF1-Score

·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.

4.4 k-NN Classification

Insert the following:

·Confusion matrix created using R along with a description of the confusion matrix

·Use the confusion matrix to calculate the following:

üAccuracy

üError Rate

üPrecision

üRecall

üF1-Score

·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.

5.
Discussion on the Results

Here, you want to discuss the results. Please discuss your results based on the following:

5.1
Classifier Comparison Based on Accuracy

Here, add a bar chart that will show the four classifiers' accuracy and discuss your results. In the discussion, you can rank the classifier based on accuracy, mention the percentage difference among various classifiers.To plot the bar chart, you have to use R

5.2
Classifier Comparison Based on Error Rate

Please follow the instructions given in 5.1

5.3
Classifier Comparison Based on Precision

Please follow the instructions given in 5.1

5.4
Classifier Comparison Based on Recall

Please follow the instructions given in 5.1

5.5
Classifier Comparison Based on F1-Score

Please follow the instructions given in 5.1

Add a conclusive discussion here in a new paragraph. Here, in terms of 5.1 to 5.5, you want to say which classifier is the best for the given dataset. You want to rank the classifier here.

6.
Conclusion

You want to write down your conclusive statement here.

APPENDIX

PLEASE COPY AND PASTE ALL R-Code here:

Code to get the statistics of the given dataset