CIND 123 Winter XXXXXXXXXXAssignment #12 Questions, 20 Points,What to Provide in your Solutions Document· Copy the code and paste in this document.· Take the snapshots of your code and the...

Run hadoop


CIND 123 Winter 2020 - Assignment #1 2 Questions, 20 Points, What to Provide in your Solutions Document · Copy the code and paste in this document. · Take the snapshots of your code and the output (in the same snapshot) and insert in this document in appropriate place. · Explain your answers in your own words. · You will be submitting a Word or pdf document as your answers. · Be careful about the instructions given throughout the questions. If any question seems unclear to you, then make an assumption, explain your assumption clearly in the answer, and solve the question accordingly. Question 1 (14 Pts) You will be running some analysis queries on an Airline Passenger Satisfaction Survey dataset using Hive. The csv data file used has been downloaded from Kaggle and the data dictionary is given as a word document in the same archive file as the data (AirlinePassenger.zip) Question Preparation Instructions (you don’t need to provide the snapshots, nor the commands for this section) · Download the data from D2L · Open the archive file (AirlinePassenger.zip) · Carefully read the Data Dictionary file for field descriptions (AirlinePassenger.docx) · Check the columns, data types and the information in the data file (AirlinePassenger.csv) · Move your data file (AirlinePassenger.csv) to your HDFS file system under a directory that is named as your firstname. If you don’t use your firstname as the directory name you will lose 5 points from this question After this point of the question, you need to provide all kind of code, screen snapshot as described in the “What to Provide in your Solutions Document” section of this questionnaire. Q1.1 (2 pts) Create a database and name it as midtermFIRSTNAME (where you will replace the FIRSTNAME with your firstname) on your Hive machine. E.g., I would create a database with name midtermroy Create a table (with the necessary fields) and name it as airlinepassengerFIRSTNAME (where you will replace the FIRSTNAME with your firstname) in your midterm database. E.g., I would create a database table with name airlinepassengerroy Show that your database table is created in the database midterm with the fields. Hint: you need to use a command that shows the database table and the fields. Q1.2 (2 pts) Load the AirlinePassenger.csv file into the airlinepassengerFIRSTNAME (where you will replace the FIRSTNAME with your firstname) Hive table. Show the first 10 lines of the loaded data. Q1.3 (3 pts) Write a query to display the total number of Loyal Customers (under CustomerType) for Male and Female customers on separate rows. Q1.4 (3 pts) Write a query to display the total numbers of Satisfaction levels (neutral or dissatisfied, satisfied) per Class (Eco, Eco Plus, Business). Sort the results by Class, and then by Satisfaction both in ascending order Q1.5 (4 pts) To answer this question, you can use the following age group definition table: · Children (1 year through 12 years) · Adolescents (13 years through 17 years) · Adults (18 years through 64 years) · Older adults (65 and older) You need to provide the total numbers of Satisfaction levels (neutral or dissatisfied, satisfied) per age group (children, adolescents, adults, older adults). Note: you are allowed to use multiple steps and queries to solve this question. Show all steps of your solution as per the “What to Provide in your Solutions Document” section. Question 2 (6 Pts) The following text (from Roberto Bolano, “2666”) is sent to a MapReduce to count the words. You can assume that the punctuation marks in the text (double quotes “”, commas, and periods) are removed by the mapper code: "I realized my happiness was artificial. I felt happy because I saw the others were happy and because I knew I should feel happy, but I wasn’t really happy." The text will be processed by two Mappers and the mappers run Combiners that execute the same task as the Reducer. Then there will be a single Reducer. Assume that each mapper takes a single line from the above text. Q2.1 (2 pts) Provide the output of each mapper after processing the text. Be careful about the order of the output. Q2.2 (2 pts) Provide the output of each combiner after completing their tasks. Q2.3 (2 pts) Provide the output of the reducer after completing its task. 4 of 4
Mar 02, 2023
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here