Big Data Assignment
, School of Science COSC2636/2632 Big Data Management Assignment 3 Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 3. Marks awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements. Due date: 23:59pm, Sunday of Week 11, 24/5/2020; Deadlines will not be advanced but they may be extended. Please check Canvas→Syllabus or via Canvas→Assignments→Assignment 3 for the most up to date information. As this is a major assignment in which you demonstrate your understanding, a university standard late penalty of 10% per each working day applies for up to 5 working days late, unless special consideration has been granted. Weighting: 39 marks 1. Overview Problem Description: Given a set of billboards (each with an influence and a cost), and a budget, find a set of billboards within the budget, which have the maximum total influence. Step-by-Step Guidance to complete this assignment: a. Read the paper “Trajectory-driven Influential Billboard Placement” which has been introduced as part of the lecture. i. Sections 1-4 of the paper. ii. Understand the EnumSel Algorithm (Algorithm 2). iii. Understand the PartSel Algorithm (Algorithm 3). b. Have a look at the Billboard dataset (available at /dataset/dataset.txt). c. Understand how to find the local solution for each cluster i. Implement Algorithm 1 (i.e., GreedySel) first. ii. Enumerate the candidate sets where the cardinality is smaller than 4 (Algorithm 2, line 2.6). iii. Based on each candidate set, calling Algorithm 1 to fill up each candidate set until reach the budget constraint (Algorithm 2, lines 2.7- 2.9). iv. Output candidate sets as local solution. d. Understand how to find the global solution based on local solutions i. Calculate the influence ?? [??][????] of the local solution for a given cluster ???? and a budget ???? (Algorithm 3, line 3.7). (In order to simplify this assignment, we provide you the billboard object with influence and cost. Therefore, you do not need to calculate ????�???? , ????�.) ii. Using a dynamic programming approach to construct the global solution based on the influence of location solutions ?? [??][????] maintained by different clusters (Algorithm3, lines 3.8-3.9). e. Complete three algorithms (i.e., GreedySel.java, EnumSel.java, and PartSel.java) Besides the specification, we provide the Ass3.zip that includes the java program that you need to complete, and a readme.pdf file that explains the structure of the java program. Before you start to implement your codes, please read the Readme.pdf carefully. What you will learn: You need to understand the basic greedy algorithm, and the dynamic programming algorithm. You need to understand why we first find the local solution from each cluster, then generate the global solution from each local solution. You need to think about the how the different size of cluster affects the efficiency and effectiveness. 2. Marking Criteria We will evaluate your program in terms of its correctness and efficiency on our test cases. We have prepared seven test datasets: two for GreedySel; two for EnumSel; three for PartSel. 1. Correctness is worth 21 marks. 1. For testing GreedySel, we test two datasets, each of them is worth 1 mark. 2. For testing EnumSel, we test two datasets, each of them is worth 2 marks. 3. For testing PartSel, we test three datasets, each of them is worth 5 marks. , 2. The efficiency is worth 6 marks, 2 marks for each algorithm. We will not evaluate the efficiency of one algorithm that fails to pass all corresponding test datasets. For example, if GreedySel does not pass all two tests, the efficiency of GreedySel will not be tested. For the efficiency evaluation, the mark is calculated based on the following conditions: 1. X/Y<= 1.5,="" efficiency="" marks="2." 2.="">=>
< 3,="" efficiency="" marks="1." 3.="">