Part B - MRJob and Hive with CSV (8 marks) In Part B your task is to answer a question about the data in a CSV file, first using MRJob, and then using Hive. By using both to answer the same question...

1 answer below »
Write a MRJob job to do this. A file called "job.py" has been created for you - you just need to fill in the details.
Write a Hive script to do this. A file called "script.hql" has been created for you - you just need to fill in the details.






Part B - MRJob and Hive with CSV (8 marks) In Part B your task is to answer a question about the data in a CSV file, first using MRJob, and then using Hive. By using both to answer the same question about the same file you can more readily see how the two techniques compare. When you click the panel on the right you'll get a connection to a server that has, in your home directory, a CSV file called "orders.csv", containing data about book orders (feel free to open the file and explore its contents). Here are the fields in the file: OrderDate (date) ISBN (string) Title (string) Category (string) PriceEach (decimal(5,2)) Quantity (integer) FirstName (string) LastName (string) City (string) Your task is to find the total dollar amount of orders for each city. Your results should appear as the following: ATLANTA 211.85 AUSTIN 391.25 BOISE 39.9 CHEYENNE 19.95 CHICAGO 111.9 CODY 55.95 EASTPOINT 182.75 KALMAZOO 170.9 MACON 61.95 MIAMI 17.9 MORRISTOWN 55.95 SEATTLE 61.9 TALLAHASSEE 144.45 TRENTON 199.85 (There is no need to sort the results or remove the quotation marks.) First (4 marks) Write a MRJob job to do this. A file called "job.py" has been created for you - you just need to fill in the details. You should be able to modify MRJob jobs that you have already seen in this week's content. You can test your job by running the following command (it tells Python to execute job.py, using orders.csv as input): $ python job.py orders.csv Second (4 marks) Write a Hive script to do this. A file called "script.hql" has been created for you - you just need to fill in the details. You should be able to modify Hive scripts that you have already seen in this week's content. You can test your script by running the following command (it tells Hive to execute the commands contained in the file script.hql): $ hive -f script.hql
Answered 2 days AfterApr 09, 2021University of Sydney

Answer To: Part B - MRJob and Hive with CSV (8 marks) In Part B your task is to answer a question about the...

Kamal answered on Apr 11 2021
151 Votes
import csv
sum = 0
lcount = 0
with open('orders.csv') as csvfile:
csvreader = csv.reader(csv
file, delimiter=',')
for col in csvreader:
print (col[8], col[4]);
lcount += 1
csvfile.close()
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here