hw11 August 2, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("hw11.ipynb") 1 Homework 11: Spam/Ham Classification - Build Your Own Model 1.1 Feature Engineering, Logistic...

1 answer below »
Hi Ineed help on Q2 Q3 Q4 please evaluate it


hw11 August 2, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("hw11.ipynb") 1 Homework 11: Spam/Ham Classification - Build Your Own Model 1.1 Feature Engineering, Logistic Regression, Cross Validation 1.2 Due Date: Thursday 8/5, 11:59 PM PDT Collaboration Policy Data science is a collaborative activity. While you may talk with others about the project, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names at the top of your notebook. Collaborators: list collaborators here 1.3 This Assignment In this homework, you will be building and improving on the concepts and functions that you implemented in Homework 10 to create your own classifier to distinguish spam emails from ham (non-spam) emails. We will evaluate your work based on your model’s accuracy and your written responses in this notebook. After this assignment, you should feel comfortable with the following: • Using sklearn libraries to process data and fit models • Validating the performance of your model and minimizing overfitting • Generating and analyzing precision-recall curves 1.4 Warning This is a real world dataset– the emails you are trying to classify are actual spam and legitimate emails. As a result, some of the spam emails may be in poor taste or be considered inappropriate. We think the benefit of working with realistic data outweighs these innapropriate emails, and wanted to give a warning at the beginning of the project so that you are made aware. 1 [2]: # Run this cell to suppress all FutureWarnings import warnings warnings.filterwarnings("ignore", category=FutureWarning) 1.5 Score Breakdown Question Points 1 6 2a 4 2b 2 3 3 4 15 Total 30 1.6 Setup and Recap Here we will provide a summary of Homework 10 to remind you of how we cleaned the data, explored it, and implemented methods that are going to be useful for building your own model. [3]: import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns sns.set(style = "whitegrid", color_codes = True, font_scale = 1.5) 1.6.1 Loading and Cleaning Data Remember that in email classification, our goal is to classify emails as spam or not spam (referred to as “ham”) using features generated from the text in the email. The dataset consists of email messages and their labels (0 for ham, 1 for spam). Your labeled training dataset contains 8348 labeled examples, and the unlabeled test set contains 1000 unlabeled examples. Run the following cell to load in the data into DataFrames. The train DataFrame contains labeled data that you will use to train your model. It contains four columns: 1. id: An identifier for the training example 2. subject: The subject of the email 3. email: The text of the email 4. spam: 1 if the email is spam, 0 if the email is ham (not spam) 2 The test DataFrame contains 1000 unlabeled emails. You will predict labels for these emails and submit your predictions to the autograder for evaluation. [4]: import zipfile with zipfile.ZipFile('spam_ham_data.zip') as item: item.extractall() [5]: original_training_data = pd.read_csv('train.csv') test = pd.read_csv('test.csv') # Convert the emails to lower case as a first step to processing the text original_training_data['email'] = original_training_data['email'].str.lower() test['email'] = test['email'].str.lower() original_training_data.head() [5]: id subject \ 0 0 Subject: A&L Daily to be auctioned in bankrupt… 1 1 Subject: Wired: "Stronger ties between ISPs an… 2 2 Subject: It's just too small … 3 3 Subject: liberal defnitions\n 4 4 Subject: RE: [ILUG] Newbie seeks advice - Suse… email spam 0 url: http://boingboing.net/#85534171\n date: n… 0 1 url: http://scriptingnews.userland.com/backiss… 0 2 \n \n \n \n
Answered Same DayAug 03, 2021

Answer To: hw11 August 2, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("hw11.ipynb") 1...

Pritam Kumar answered on Aug 04 2021
147 Votes
hw11 (with answers)
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here