Please see instructions in the attached document. Feel free to modify all codes.
Assignment: This assignment is about building a news scraper. What it would do is that user would define certain criteria (ticker, and/or company name, date range, etc.) and return news information back (title, URL, date, author, full article body text, etc.). There should be open source packages on the news part, please do a bit research online. All the parameters I mentioned above, you can fine tune them using your own interpretation. Therefore, this assignment contains two parts: 1. The news search: given a stock/company, and time range, return found results from Google (or not specifically from Google, but return a list of found articles from different news sources such as WSJ, CNN, NBC, BBC etc.). There must be hundreds or thousands of results. 2. The news scraping, for each article, take the URL and scrape the title and body of that article. This part has already been done and codes are uploaded in a .ipynb file. There’s a JSON file which has websites URL Your job: Please do the first part and then merge it with the second so that this news scraper function is complete. The function should be able to return a list of related articles associated with the search input, the company’s name or a date range. For example, if you give a company name like Tesla and put date range in between 2020 and 2023. It should return a list of many articles, news related to Tesla between 2020 and 2023. Then scraping function in the second part will take the article’s data (URL, title and body of that article). Please combine both parts into a single python file in JUPYTER Notebook. Please follow the format in the uploaded script for comment mark-ups and documentation. DO NOT simplify write bunch of codes without any comments for reviewers to understand. Solution: Please submit a html file where both codes and outputs/sample runs are displayed. Follow the format in the given screenshot, do not put too much of codes in a single block or cell. Picture is in the second page, I also uploaded this html file for you.