Will build a Youtube movie comment spam classifier using python. Build a simple text classifier...

Question

Will build a Youtube movie comment spam classifier using python. Build a simple text classifier using the "Bag of words" language model & the Navie Bayes classifier. All the instruction is included inside the word document file together with the data that will be used.

In this project will build a Youtube movie comment spam classifier. OVERVIEW Build a simple text classifier using the "Bag of words" language model & the Navie Bayes classifier. As you know movies on Youtube are very common and once a movie is made available many comments are posted over the internet by viewers in response to the movie. But in some instances these comments are auto generated and considered spam. The purpose of your model is to filter the spam comments by training an Naive Bayes classifier. The data is available for five movies at the UCI machine learning repository. You will be assigned a movie comments file to work on by the professor.Youtube01 Psy PRESENTATION You will have ten minutes to demo the code and the results. REQUIREMENTS 1. Load the data into a pandas data frame. 2. Carry out some basic data exploration and present your results. (Note: You only need two columns for this project, make sure you identify them correctly, if any doubts ask your professor) 3. Using nltk toolkit classes and methods prepare the data for model building (Building a Category text predictor ). Use count_vectorizer.fit_transform(). 4. Present highlights of the output (initial features) such as the new shape of the data and any other useful information before proceeding. 5. Downscale the transformed data using tf-idf and again present highlights of the output (final features) such as the new shape of the data and any other useful information before proceeding. 6. Use pandas.sample to shuffle the dataset, set frac =1 7. Using pandas split your dataset into 75% for training and 25% for testing, make sure to separate the class from the feature(s). (Do not use test_train_ split) 8. Fit the training data into a Naive Bayes classifier. 9. Cross validate the model on the training data using 5-fold and print the mean results of model accuracy. 10. Test the model on the test data, print the confusion matrix and the accuracy of the model. 11. Come up with 6 new comments (4 comments should be non spam and 2 comment spam) and pass them to the classifier and check the results. You can be very creative and even do more . 12. Present all the results and conclusions. 13. Drop code, report and power point presentation into the project assessment folder for grading.

project-assigment-5npt3dxx.docx sample-scripts-jnwrm4io.zip youtube01-psy-jvpkjsdj.csv

Vicky · Accepted Answer

# import libraries
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
# Read data
data = pd.read_csv('Youtube01_Psy.csv')
# Basic data exploration
data = data[['CONTENT','CLASS']]
print('Dimension of dataframe:',data.shape)
print('No. of comments:',data.shape[0])
# prepare the data for model building using CountVectorizer()
count_vectorizer = CountVectorizer()
X_counts = count_vectorizer.fit_transform(data['CONTENT'])
print('X_counts:',X_counts[:5].

In this project will build a Youtube movie comment spam classifier. OVERVIEW Build a simple text classifier using the "Bag of words" language model & the Navie Bayes classifier. As you know movies...

Answer To: In this project will build a Youtube movie comment spam classifier. OVERVIEW Build a simple text...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment