Please find instructions for this order in attached files
Fake New Detection Using Machine Learning in Python 1. Introduction We rely greatly on online sources for news in today’s day and age, but not all sources are reliable. There is a great inflow of fake news from various sources, thus the need to differentiate between the genuine news sources and the fake ones is extremely high. One’s perspective about life and society relies heavily on the news, to which we are exposed. To stop the plague of fake news from consuming people’s minds we should use every resource available and today one of the greatest resources we have is the advancement in data science and the use of Python. It is imperative that we use machine learning and Python to the best of its abilities to quell the endless flow of fake news. 2. Proposed Research Problem and Some Background Information 2.1 Fake New Detection Fake news is a very common term used today, which implies the news that is shared from questionable sources on web-based platforms or on social media, which is sensationalized or in a number of cases, which outright lies. This kind of news generally helps either one or another political ideology. Sometimes, such news is also spread for spreading an untrue image of products (De Beer & Matthee, 2020). These sources tend to be highly radicalized and consumers of such fake news start to be exposed to these radicalized lies. That is a very dangerous reality, which needs to be fought against. No matter what, one’s political ideology is or one’s choice of commercial products is it should be based off on true events. 3. Fake New Detection using TFIDF Vectorizer 3.1 TF (Term Frequency): The number of times a particular term is repeated through a document is known as the term frequency. A higher value is indicative of the document having the word or phrase repeated more often than the others do. If that particular word or phrase is a part of the search parameters then there is good chance that the document with higher TF is the match. 3.2 IDF (Inverse Document Frequency): Word or phrases that are common not just in one but all the documents in a corpus may not be relevant. IDF is the measure of how significant a word or phrase is over an entire collection of documents. The TFIDF Vectorizer categorizes the raw data in documents based on term frequency and its significance over a corpus (Singh & Shashi, 2019). 3.3 Passive Aggressive Classifier: This is an algorithm, which remains passive as long as the data seems to be correct but turns aggressive in case of anomalies or errors. It is built in way such that it updates and corrects the mistakes. 3.4 Natural Language Processing: This is a combination of linguistics, programming, and artificial intelligence (AI) to detect language and speech patterns of people (Gilda, 2017). It is the method used to process and analyze raw language data. 3.5 Machine Learning Classification: Machine learning is a set of algorithms that are used to produce better accurate results, when analyzing raw data. 3.6 Findings There are various projects, which have tried to create algorithms for fake news detection. Most of them use the TFIDF Vectorizer. Even though their efficiency is varied, it is still quite high. One used a political data set and executed the TFIDF Vectorizer along with the Passive Aggressive Classifier and their efficiency for removing documents with fake news was 92.82% (Gilda, 2017). Another used the Liar dataset and executed TFIDF Vectorizer with the passive Aggressive Classifier and the Machine learning classification and their efficiency for removing fake news documents resulted to be 92% (Choudhary & Arora, 2021). One of the most important publications in this area is as mentioned by Khanam, Alwasel, Sirafi and Rashid (2021). It uses a different approach to the Naives Bayes Classifier and increases its efficiency by almost twenty percent. Another important read in this context is by De Beer and Matthee (2020). It is a comprehensive review of all the research and the projects, which were done before the paper on identifying fake news using Python and machine learning. 3.7 Data Sets Some of the most commonly used data sets are: LIAR: It is one of the most commonly used political data set which was created using politifact.com as the raw data input. Fake News Net: It is the data set, which is used quite often, consisting of news content, social media content and spatial temporal data. This was created by using various reliable sources for the information (Albahr & Albahar, 2020). There are more data sets available, tailored for various categories depending on the focus of the different projects. 4. Conclusion There is considerable amount of research done on the detection of fake news, the efficiency of which is considerably high. Nevertheless, the execution of this on the web, on social media is practically nil. The results of the research have not provided an absolute detection; new programs and methods need to be hashed out to increase the efficiency of fake news detection. Along with this, the real-world executions of these methods on the internet are as important. References Albahr, A., & Albahar, M. (2020). An empirical comparison of fake news detection using different machine learning algorithms. IJACSA Choudhary, A., & Arora, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Systems with Applications, 169, 114171 De Beer, D., & Matthee, M. (2020). Approaches to identify fake news: a systematic literature review. In International Conference on Integrated Science, 13-22 Gilda, S., (2017). Notice of violation of IEEE publication principles: Evaluating machine learning algorithms for fake news detection. In 2017 IEEE 15th student conference on research and development (SCOReD), 110-115 Khanam, Z., Alwasel, B. N., Sirafi, H., & Rashid, M. (2021). Fake News Detection Using Machine Learning Approaches. In IOP Conference Series: Materials Science and Engineering, 1099(1), 012040 Singh, A. K., & Shashi, M. (2019). Vectorization of text documents for identifying unifiable news articles. International Journal of Advanced Computer Science Application, 10 In this assignment, you will put together an introduction writeup for the independent research report. The topic is Fake New Detection Using Machine Learning in Python. Please refer to the document which is attached on the website as a general guideline and proposal. This writeup should contain and cover all these sections: Introduction 1.1 Project Background and Execute Summary Project background, needs and importance, targeted project problem, motivations and goals. Planned project approaches and method. Expected project contributions and applications. 1.2 Project Requirements Functional and AI-powered feature requirements which can be tested and measurable; data requirements. 1.3 Project Deliverables Deliverables including reports, prototypes, development applications, and/or production applications. 1.4 Technology and Solution Survey Survey of current technologies and solutions that could meet the project requirements. Summary and classifications of features and applications. Comparison of solutions including approaches, algorithms and models. 1.5 Literature Survey of Existing Research Literature survey including summary and classification of research papers with justifications and contributions. Comparison among relevant research papers. Please pay attention to writing criteria especially plagiarism, given that either word for word plagiarism or paraphrasing plagiarism are not allowed. This will be reviewed strictly by the grader. All publications and sources must be indicated and referenced in quotes using APA style. Submit this in a Word Document. Submit all additional materials, data sources/datasets and references.