Your task in completing this assignment is to analyse a range of reviews for the most common words that appear for both positive and negative sentiments. The data are contained in a file called...

Hi, as per attachment and need mapper, reducer and combiner python codes for Hadoop task. I also need pros and cons of using the combiner for this particular task. The codes have already been started, I'll send the rest of the files once I have the quote.


Your task in completing this assignment is to analyse a range of reviews for the most common words that appear for both positive and negative sentiments. The data are contained in a file called sentiments.txt (a shorter version called shortsent.txt is also available for testing purposes). The file contains the type of item being reviewed (Restaurant, Movie, Product) followed by the review text and then a sentiment value (1 for positive, 0 for negative). Each review is on a single line of the file with the different fields separated by a tab character, as shown in the following example: Restaurant I swung in to give them a try but was disappointed. 0 Restaurant I had a pretty satisfying experience. 1 Movie Some applause should be given to the "prelude". 1 Product A must study for anyone interested poor design. 0 Your task is to write a Python based Hadoop Map/Reduce solution that will, in a single pass, find the 5 most common words associated with a given item type and sentiment. The result will be 6 rows of data consisting of the top 5 words for each item type and sentiment score in a form similar to the following (the format and order can be different but the words for each item/sentiment must be correct): Restaurant 0 brother again law night eating Restaurant 1 you'd any bean fry stir Product 0 anyone must industrial study interested Product 1 phone use restored simple performance Movie 0 enter script watch unethical rated Movie 1 however both superb rickman complex You are required to exclude common words in the reviews and the words that you should exclude are provided in the file excluded.txt. You could start by hard coding some of these words into your Mapper code but you are eventually expected to load them from a cache file when the program runs. Along with your code, you should also submit a short written report, detailing your design and the results you found. Task 1 Consider and compare developing two solutions: one without a Combiner and one with a Combiner. Discuss what keys and values the mapper will emit compared with the Combiner or Reducer and how this affects the efficiency of your solution. Some points to consider are how much data will be moved across the network and how many different reducers will be used in your design. Task 2 – modify the codes (starting codes are provided in attached files) Using the Python code provided (combiner.py, mapper.py, reducer.py) as a starting point, modify this code to produce an efficient solution that uses a Combiner. A default Combiner (combiner.py) is provided for you that will just emit whatever it is sent so that you can focus on the Mapper (mapper.py) and Reducer (reducer.py) first. Once you have this working, you can consider how to improve the solution by making a better Combiner. You can use the simhadoop.sh script to test your code within a terminal on either your own PC or the Jupyter Lab setup. This version of simhadoop will also produce intermediary files for the mapper output (mapout.txt), combiner output (comout.txt) and the reducer output (results.txt). Your final submitted code and output should be run with the full sentimentss.txt file on the Hadoop server and the output of this submitted along with your Python code. Your code should find the 5 most commonly used words for each of the 3 item types and sentiment values (6 lines of data in total as shown on the previous page). It should exclude words in the exclusion list, ignore case sensitivity and should remove punctuation marks. Once you have completed your code and run it on Hadoop, please include the results that are produced by Hadoop at the end of your report.
Dec 15, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here