Project Title: Comparing classification techniques to identify the best classifier In this project, you will be working with the attached "bank.csv" to compare different classification models. The...



Project Title: Comparing classification techniques to identify the best classifier


In this project, you will be working with the attached"bank.csv"to compare different classification models. The description of the data file is given in the"DatasetDescription.txt"file. So, please read the file carefully and understand the dataset. The target class in the file is the 'y' attribute. Here, the problem is to predict whether the client subscribes to a term deposit or not.


You need to run an experiment to train and test the learning model using the following four classifiers:


1.CTree


2.J48 Tree


3.Linear Classifier


4.K Nearest Neighbour (k-NN)


After the experiment is complete, you need to compare the results to identify the best classifier for the given dataset.






Please upload the following to D2L by Tuesday 27 July 2021:


1.Project Reportas anMS WORDfile. (Please follow the template given on pages 2-6 of this document andpreciselyfollow the template)


2..R files


3.Presentation Slidesas aPDFfile.(You can have up to 10-12 slides. During the presentation, you want to describe the methodology, dataset, and experiment results. The duration of the presentation will be15minutes)





Turnitin will be enabled for this project.







Project Title: Comparing classification techniques to identify the best classifier



Abstract:Here, using no more than 160 words, provide a summary of your work, and it should include your findings as well.



1.
Introduction


Here, you will describe the problem in detail and write short descriptions about the four classifiers:


CTree:


J48 Tree


Linear Classifier


K Nearest Neighbour (k-NN)




2.
Methodology


Here, you describe the steps that you have followed to complete the experiment. Steps should include the basic steps of data analytics, such as data exploration, preprocessing the data, applying the learning model, and testing the model.



3.
Experiment Design


As an introduction, here, you write about the experiment design, i.e., you want to write about the plan to run the experiment



3.1
Description of the given dataset


Here, you describe the given dataset. At first, you write about the dimension of the dataset, and you can include the mean, median, etc., for different variables. The dataset has lots of 'unknown' entries, include the statistics of unknown entries, i.e., how many rows have the 'unknown' entries, the percentage of the 'unknown' entries.



3.2 Preprocessing the data


Since the dataset has lots of 'unknown' entries, you need to clean them.Start the preprocessing step by removing the column' poutcome'.Now, remove all the rows that have the 'unknown' entries. After proper cleaning, you should have around 31000 rows left to run the experiment.


So, here write how you have cleaned the data. Now, describe the cleaned dataset. Please follow the instructions given in 3.1 for what to include here.









3.3 Preparing the Training and Test Datasets


Here describe what you have done to prepare the training and test datasets from thecleaneddataset. Mention how many rows you have for training and how many rows you have for testing the model.Tip: you can use an 80-20 split to prepare the datasets.








4.
Results


Here, provide an introduction on how you will present the results. You want to say to perform the comparison of the classifiers what metrics you will use.



4.1 CTree Classification


Insert the following:


·Confusion matrix created using R along with a description of the confusion matrix


·Use the confusion matrix to calculate the following:


üAccuracy


üError Rate


üPrecision


üRecall


üF1-Score


·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.






4.2 J48 Classification


Insert the following:


·Confusion matrix created using R along with a description of the confusion matrix


·Use the confusion matrix to calculate the following:


üAccuracy


üError Rate


üPrecision


üRecall


üF1-Score


·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.



4.3 Linear Classification



To design the linear model, you can use 1.5 as the threshold.


Insert the following:


·Confusion matrix created using R along with a description of the confusion matrix


·Use the confusion matrix to calculate the following:


üAccuracy


üError Rate


üPrecision


üRecall


üF1-Score


·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.



4.4 k-NN Classification


Insert the following:


·Confusion matrix created using R along with a description of the confusion matrix


·Use the confusion matrix to calculate the following:


üAccuracy


üError Rate


üPrecision


üRecall


üF1-Score


·Important:Please include the calculation here.DO NOT ATTACH HAND-WRITTEN CALCULATIONS. Tip: You can use the MS-WORD equation tool to complete the calculation.






5.
Discussion on the Results


Here, you want to discuss the results. Please discuss your results based on the following:



5.1
Classifier Comparison Based on Accuracy


Here, add a bar chart that will show the four classifiers' accuracy and discuss your results. In the discussion, you can rank the classifier based on accuracy, mention the percentage difference among various classifiers.To plot the bar chart, you have to use R



5.2
Classifier Comparison Based on Error Rate



Please follow the instructions given in 5.1



5.3
Classifier Comparison Based on Precision



Please follow the instructions given in 5.1



5.4
Classifier Comparison Based on Recall



Please follow the instructions given in 5.1



5.5
Classifier Comparison Based on F1-Score



Please follow the instructions given in 5.1



Add a conclusive discussion here in a new paragraph. Here, in terms of 5.1 to 5.5, you want to say which classifier is the best for the given dataset. You want to rank the classifier here.



6.
Conclusion


You want to write down your conclusive statement here.




APPENDIX



PLEASE COPY AND PASTE ALL R-Code here:



Code to get the statistics of the given dataset



//Add the related R-Code here



Code to Data Cleaning



//Add the related R-Code here



Code to get the statistics of the cleaned dataset



//Add the related R-Code here






Code to Prepare the Training and the Test Dataset



//Add the related R-Code here






Code to CTree classification and to generate Confusion Matrix for CTree



//Add the related R-Code here






Code to J48 classification and to generate Confusion Matrix for J48



//Add the related R-Code here



Code to Linear classification and to generate Confusion Matrix for Linear Classifier



//Add the related R-Code here






Code to KNN classification and to generate Confusion Matrix for KNN



//Add the related R-Code here






Code to Generate the Bar charts of 5.1 to 5.5



//Add the related R-Code here









Please follow this template to complete the project. While grading the project, I will be strictly following this template. So, please don't miss any steps.

Jul 26, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here