Answer To: So I want an report and with the structure written below. The report will be about an spam email...
Shikha answered on May 12 2021
Student Name
Student ID 2
SPAM Email Filter – Report
Submitted By
Course
Professor
Date
Table of Contents
1. Abstract 3
2. Introduction 4
2.1. Spam Emails 4
2.2. Problem Definition 5
2.3. Project Objective 6
2.4. Project Scope 6
2.5. Spam Filtering 6
3. Requirement Specification 7
3.1. Functional Requirements 7
3.2. Non-Functional Requirements 7
3.3. Hardware Specification 7
3.4. Software Specification 8
3.5. Testing Specification 8
4. Implementation and Testing 9
4.1. Data Preprocessing 10
4.2. Test-driven development Methodology 10
4.3. Acceptance Test Plan 13
5. Demonstration and Evaluation 14
5.1. Spam Filtering Rules Architecture 14
5.2. Machine Learning Algorithms 15
5.3. Decision Tree Algorithm 16
5.4. System Architecture 17
5.5. Algorithm Steps 19
5.6. Project Scheduling 20
5.7. GANTT Chart 23
6. Conclusion 24
7. References 26
1. Abstract
The increased volume of spontaneous bulk email that are also known as spam has created a requirement for anti-spam email filters. In this project, we will use machine learning filter algorithm that is based on training dataset. The underlying exposition of the background analyzes the working of anti-spam filters, the developing idea of spam, spammers double dealing with email service providers (ESPs), and the Machine Learning front in filtering spams. We will use decision tree classifier to develop our anti-spam algorithm. In this report, we are developing this algorithm by analyzing all functional and non-functional requirements and the scheduling. It will take approximately one month time in developing and testing the anti-spam filter application.
2. Introduction
As the internet has become an important part of our daily life, emails have become an integral part for data communication. With the increase in the growth of emails and the internet, spam mails have also increased in recent years. Spams are not restricted to the geographical location; they can emerge from any location with access of the Internet. Therefore, it becomes necessary to develop an anti-spam filter that can be a useful tool for mitigating these spam emails (Christina, V. and Karpagavalli, 2010). As per some research, the use of spam has been increased in distributing viruses, malware, phishing sites access, and so on. Approximately, 54 billion spam e-mails are sent on daily basis (Bhowmick and Hazarika, 2018).
There are many methods that can be used for automatically classify the messages whether its spam or legitimate. But the Machine learning algorithms got more success. This method incorporates methodologies that are viewed as top performers in content order, similar to help vector machines and Na¨ıve Bayes classifiers (Almeida and Yamakami, 2012). In this report, we are developing machine learning based spam filter system that can help in analyzing data and can prevent mail users from these spam emails. The main objective of the report is to develop and analyze requirements and implement and evaluate the system by testing it.
2.1. Spam Emails
Spam emails can be defined as the junk mails or unwanted emails that are basically sent for marketing purpose. The number of emails can be large with the use of botnets, and network of infected computers. The main risk associated with the spam emails is for malicious attack in order to gain access to the systems or network of an organization (Cisco Inc., 2019). A significant relationship has been found between spam email users and their privacy concerns. While talking about privacy, spam email users tend to exhibit its passive as well as proactive behaviors (Park and Sharma, 2007).
The generation of spam on large extent is creating serious problems for the users as well as internet service provider. The main issues of the spam messages include degradation of user search experience, assistance in generation of virus in network, increasing network load traffic, wastage of resources like bandwidth, storage and computation power (Charumathis and Thirumal, 2017). Spam emails are still a problem for many of email users as they fall for the spam. There are many ways by which spammers can get our email addresses. It includes:
Some sophisticated tools are used by spammers and cybercriminals. Publicly posted some post can be viewed by spammers and hence, can steal our emails address. Also, there are some tools that are used by cybercriminals that can generate common usernames and hence, by pairing with some domain can easily crack our emails with passwords. Some spammers purchase emails list legally or illegally for marketing purpose (Blakemore, 2010).
2.2. Problem Definition
As we know that an email becomes spam when it is sent to multiple recipients. The main reason for using spam emails is for advertising and marketing. These undesirable messages cause disadvantages to the beneficiary and consume the user's network resources. The United States Federal Exchange Commission portrayed that 66% of spams have false data some place in the message and 18% of spams advertise "Grown-up" material. As indicated by another report 12% of clients spend half hour or more every day managing spam messages.
Many issues arise with the increase in spam emails. These are they are high in volume and fill in mailbox of users. Also, there is no relationship between recipients' interest areas and the spam content mails. The additional bandwidth will be used in these spam messages. At last, Spam messages cause a ton of security issues in light of the fact that a large part of them incorporate Trojan, Malwares, and infections (Lakshmi, 2018).
2.3. Project Objective
The main objective the advanced spam email filter system using machine learning is to provide a safe as well as secured experience of emails to the users by making a detection and then filtering the annoying spam emails that are received from unwanted senders. The main techniques that will be used will be text classification as well as machine learning techniques enabled in our application to detect accurately and making filtration of spam emails.
2.4. Project Scope
The scope of the project is to develop machine learning based advanced spam email filtering system that can be used to analyze data and filter according to the spam emails and legitimate emails. There will be an option of block and unblock that can block a particular spam email address so that further emails can be mitigated.
2.5. Spam Filtering
Our spam filtering algorithm can help in reducing the feature space without making relinquishing exceptional order precision, yet the viability depended on the nature of the training dataset. According to some research, the attainability of the way to deal with locate the best learning algorithm as well as the metadata to be utilized, which is an extremely noteworthy commitment in email characterization utilizing Rainbow system. The graph-based data mining approach for classifying the spam that structures/designs can be easily extracted from a pre-characterized email organizer and the equivalent can be utilized viably for ordering approaching email messages. The good performance of spam filtering algorithm can be acquired by diminishing the classification error by finding worldly relations in an email sequence in the type of transient sequence pattern and inserting the found data into content-based learning strategies (Sable and Gulhane, 2013).
3. Requirement Specification
3.1. Functional Requirements
User Registration – A user registration form must be there that can help the user to register into the spam filtering system. It will ask for email address that needs to be filtered for spam messages. After saving the details a user id and password will be created.
Analyze Data – After implementing spam filtering system, the system will analyze email data in order to determine which order is useful email and which one is spam email.
Mail Filtering – After analyzing data, the system will filter the spam email messages into a folder.
Block Emails – There must an option for blocking the email address of unwanted spam emails.
Delete Emails – The delete emails will delete all unwanted emails either by selecting one by one or by selecting all emails.
Mobile Application – The spam filtering system can be a desktop or mobile application.
3.2. Non-Functional Requirements
Usability – The advanced spam email filtering system must be easy to use.
Reliability – It must correctly filter the spams from the useful emails.
Scalability – With the increase in emails, it must work efficiently.
Security – The system must be based on login credentials so that only authorized users must have the system access.
3.3. Hardware Specification
No particular server will be required for accessing this email filtering system. Only desktops or laptops or smart phones are required with the internet connection to access the spam filtering application.
3.4. Software Specification
The software application can be desktop based or mobile based that depends on the user requirements. The protocols that are required for implementing this application are SMTP protocol and IP protocol.
3.5. Testing Specification
The advanced spam email filter needs to be test before implementing it at user end. The test plan for the same is as following:
Purpose
The main purpose of the test plan is to check whether all functional requirements has been fulfilled or not. If not, the same will be implemented before proceeding with other steps.
Features to be Tested
The features that will be tested in this application are
Does the system ask for registration before logging into the system?
Does the system can also be implemented on mobiles?
Does the system able to filter all spam emails from the inbox to the Spam folder?
Does the system alert whether the messages are spam or not?
Does the system have the option of blocking the email message of spams?
Does the system have the option of deleting all unnecessary emails?
Approach
Unit testing will be best option for the testing of the system. The main reason for choosing unit testing is it is easy and the errors can be detected earlier. This is also known as white box testing. It will be better that the developer himself conducts the unit testing on code as the full knowledge of the coding is required in order to conduct unit testing. Here, every line of the code needs to be tested whether its working properly or not. The entire project will be divided into sub-modules for performing the testing (Francino, 2019).
Acceptance Criteria
The system will only be accepted when all users’ requirements are included in this which include user registration, filtering spam emails, blocking unknown spam email address, deleting spams and so on.
4. Implementation and Testing
The implementation process of decision tree classifier based anti-spam filtering application is based on many steps:
Exploratory Data Analysis...