Please see instructions in the attached file. Refer to feedback for updating codes, this is your task for this order so that it meets the general requirements. Please assign expert Nidhi for this order if possible.
Assignment: This assignment is about building a news scraper. What it would do is that user would define certain criteria (ticker, and/or company name, date range, etc.) and return news information back (title, URL, date, author, full article body text, etc.). There should be open source packages on the news part, please do a bit research online. All the parameters I mentioned above, you can fine tune them using your own interpretation. Therefore, this assignment contains two parts: 1. The news search: given a stock/company, and time range, return found results from Google (or not specifically from Google, but return a list of found articles from different news sources such as WSJ, CNN, NBC, BBC etc.). There must be hundreds or thousands of results. 2. The news scraping, for each article, take the URL and scrape the title and body of that article. This part has already been done and codes are uploaded in a .ipynb file. There’s a JSON file which has websites URL Your job: Please do the first part and then merge it with the second so that this news scraper function is complete. The function should be able to return a list of related articles associated with the search input, the company’s name or a date range. For example, if you give a company name like Tesla and put date range in between 2020 and 2023. It should return a list of many articles, news related to Tesla between 2020 and 2023. Then scraping function in the second part will take the article’s data (URL, title and body of that article). Please combine both parts into a single python file in JUPYTER Notebook. Please follow the format in the uploaded script for comment mark-ups and documentation. DO NOT simplify write bunch of codes without any comments for reviewers to understand. Feedback for updating codes: From news sources perspective, am not saying news source is Google, I'm saying let's use Google as the news aggregation. Say for example, I want to see all the news on Tesla on Nov. 14th. CNN or BBC might only have 3 or 4 articles each. But when you search on Google, it gives you all kinds of news or blogs from different sources. That's what I mean let's use Google as the news source. In order to achieve that, ideally we can use paid Google API, but as of now, we can try to see if we can leverage any open source packages, for example from this https://github.com/topics/google-news https://github.com/topics/google-news Solution: Please submit a html file where both codes and outputs/sample runs are displayed. Follow the format in the given screenshot, do not put too much of codes in a single block or cell.