, Page 1 of 2 School of Science COSC2636/2632 Big Data Management Assignment 1 Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 1. Marks awarded...

1 answer below »
I have uploaded files


, Page 1 of 2 School of Science COSC2636/2632 Big Data Management Assignment 1 Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 1. Marks awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements. This assessment supports CLO 1-5. Due date: 23:59pm, Sunday of Week 4; Deadlines will not be advanced but they may be extended. Please check Canvas→Syllabus or via Canvas→Assignments→Assignment 1 for the most up to date information. As this is a major assignment in which you demonstrate your understanding, a university standard late penalty of 10% per each working day applies for up to 5 working days late, unless special consideration has been granted. Weighting: 16 marks 1. Overview Problem Description: Given a set of items which achieve different scores based on different criteria, you need to find the top-k items which have the highest aggregated scores. The aggregated scores of an item A is the sum of all scores of A based on different criteria. Requirement: In the code skeleton (i.e., TopK.java), we have provided necessary APIs and implemented data preprocessing step. What you need to do: (1) understand the code skeleton and the function of each API; (2) implement the core function “thresholdAlgo()”. More specifically, you need to complete the missing part of the for loop in the function “thresholdAlgo()”. Before you do the programming, please refer to the slide (i.e., “Topk Query.pdf”) for the pseudocode of the “thresholdAlgo()”. Sample Test Case: We provide two datasets for you to test your code. Each dataset has one million items with ten criteria. The sampled output “output.txt” is also contained in the assignment folder. What you will learn: You will learn how to use limited memory to efficiently find the top k items with the highest aggregated scores without scanning/loading the whole dataset. If there are questions, you can email the tutors. 2. Assessment criteria We will evaluate your program in terms of its correctness and efficiency on our test cases. We have prepared 10 test datasets each of which has the same size as the sample test datasets. 1. Correctness is worth 10 marks (i.e., one for each test), and efficiency is worth 5 marks. 2. We will not evaluate the efficiency of your program if your program fails to pass all test datasets. In this case, your marks cannot be greater than 9. Note: The marking is based on the correctness of your implemented algorithm, as well as the efficiency upon the correctness of the results obtained. 1. For the efficiency evaluation, the marks is calculated based on the following conditions: 1. X-Y< 1,="" efficiency="" marks="6." 2.=""><>< 2,="" efficiency="" marks="3." 3.=""><>< 3,="" efficiency="" marks="1." 4.=""><= t(x’1,…,x’m)="" whenever=""><=x’i for every i. naïve algorithm 1. compute overall score for every object by looking into each sorted list. r1 x1 1 x2 0.8 x3 0.5 x4 0.3 x5 0.1 r2 x2 0.8 x3 0.7 x1 0.3 x4 0.2 x5 0.1 r3 x4 0.8 x3 0.6 x1 0.2 x5 0.1 x2 0 naïve algorithm 1. compute overall score for every object by looking into each sorted list. r1 x1 1 x2 0.8 x3 0.5 x4 0.3 x5 0.1 r2 x2 0.8 x3 0.7 x1 0.3 x4 0.2 x5 0.1 r3 x4 0.8 x3 0.6 x1 0.2 x5 0.1 x2 0 x1 1.5 x2 1.6 x3 1.8 x4 1.3 x5 0.3 naïve algorithm 2. return k objects with the highest overall score. r1 x1 1 x2 0.8 x3 0.5 x4 0.3 x5 0.1 r2 x2 0.8 x3 0.7 x1 0.3 x4 0.2 x5 0.1 r3 x4 0.8 x3 0.6 x1 0.2 x5 0.1 x2 0 x3 1.8 x2 1.6 x1 1.5 x4 1.3 x5 0.3 return top-2 objects fagin’s algorithm 1. sequentially access all the sorted lists in parallel until there are k objects that have been seen in all lists. r1 x1 1 x2 0.8 x3 0.5 x4 0.3 x5 0.1 r2 x2 0.8 x3 0.7 x1 0.3 x4 0.2 x5 0.1 r3 x4 0.8 x3 0.6 x1 0.2 x5 0.1 x2 0 fagin’s algorithm 1. sequentially access all the sorted lists in parallel until there are k objects that have been seen in all lists. x4 0.3 r1 x4 0.2 r2 x5 0.1 x3 0.5 x1 0.3 x1 0.2 x2 0.8 x3 0.7 x3 0.6 x1 1 x2 0.8 x4 0.8 r3 x5 0.1 x5 0.1 x2 0 since k = 2, and x1 and x3 have been seen in all the 3 lists fagin’s algorithm 2. perform random accesses to obtain the scores of all seen objects x3 0.5 x1 0.3 x1 0.2 x2 0.8 x3 0.7 x3 0.6 x1 1 x2 0.8 x4 0.8 r1 r2 r3 x4 0.3 x4 0.2 x5 0.1 x5 0.1 for="" every="" i.="" naïve="" algorithm="" 1.="" compute="" overall="" score="" for="" every="" object="" by="" looking="" into="" each="" sorted="" list.="" r1="" x1="" 1="" x2="" 0.8="" x3="" 0.5="" x4="" 0.3="" x5="" 0.1="" r2="" x2="" 0.8="" x3="" 0.7="" x1="" 0.3="" x4="" 0.2="" x5="" 0.1="" r3="" x4="" 0.8="" x3="" 0.6="" x1="" 0.2="" x5="" 0.1="" x2="" 0="" naïve="" algorithm="" 1.="" compute="" overall="" score="" for="" every="" object="" by="" looking="" into="" each="" sorted="" list.="" r1="" x1="" 1="" x2="" 0.8="" x3="" 0.5="" x4="" 0.3="" x5="" 0.1="" r2="" x2="" 0.8="" x3="" 0.7="" x1="" 0.3="" x4="" 0.2="" x5="" 0.1="" r3="" x4="" 0.8="" x3="" 0.6="" x1="" 0.2="" x5="" 0.1="" x2="" 0="" x1="" 1.5="" x2="" 1.6="" x3="" 1.8="" x4="" 1.3="" x5="" 0.3="" naïve="" algorithm="" 2.="" return="" k="" objects="" with="" the="" highest="" overall="" score.="" r1="" x1="" 1="" x2="" 0.8="" x3="" 0.5="" x4="" 0.3="" x5="" 0.1="" r2="" x2="" 0.8="" x3="" 0.7="" x1="" 0.3="" x4="" 0.2="" x5="" 0.1="" r3="" x4="" 0.8="" x3="" 0.6="" x1="" 0.2="" x5="" 0.1="" x2="" 0="" x3="" 1.8="" x2="" 1.6="" x1="" 1.5="" x4="" 1.3="" x5="" 0.3="" return="" top-2="" objects="" fagin’s="" algorithm="" 1.="" sequentially="" access="" all="" the="" sorted="" lists="" in="" parallel="" until="" there="" are="" k="" objects="" that="" have="" been="" seen="" in="" all="" lists.="" r1="" x1="" 1="" x2="" 0.8="" x3="" 0.5="" x4="" 0.3="" x5="" 0.1="" r2="" x2="" 0.8="" x3="" 0.7="" x1="" 0.3="" x4="" 0.2="" x5="" 0.1="" r3="" x4="" 0.8="" x3="" 0.6="" x1="" 0.2="" x5="" 0.1="" x2="" 0="" fagin’s="" algorithm="" 1.="" sequentially="" access="" all="" the="" sorted="" lists="" in="" parallel="" until="" there="" are="" k="" objects="" that="" have="" been="" seen="" in="" all="" lists.="" x4="" 0.3="" r1="" x4="" 0.2="" r2="" x5="" 0.1="" x3="" 0.5="" x1="" 0.3="" x1="" 0.2="" x2="" 0.8="" x3="" 0.7="" x3="" 0.6="" x1="" 1="" x2="" 0.8="" x4="" 0.8="" r3="" x5="" 0.1="" x5="" 0.1="" x2="" 0="" since="" k="2," and="" x1="" and="" x3="" have="" been="" seen="" in="" all="" the="" 3="" lists="" fagin’s="" algorithm="" 2.="" perform="" random="" accesses="" to="" obtain="" the="" scores="" of="" all="" seen="" objects="" x3="" 0.5="" x1="" 0.3="" x1="" 0.2="" x2="" 0.8="" x3="" 0.7="" x3="" 0.6="" x1="" 1="" x2="" 0.8="" x4="" 0.8="" r1="" r2="" r3="" x4="" 0.3="" x4="" 0.2="" x5="" 0.1="" x5="">
Answered Same DayMar 26, 2021COSC2636

Answer To: , Page 1 of 2 School of Science COSC2636/2632 Big Data Management Assignment 1 Assessment Type:...

Kshitij answered on Mar 27 2021
133 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here