xxxxxxxxxxx
CS7071 Advanced Database Management (ADM): Homework #2 Due on Apr 20, 2022 at 11:59pm Mnumber: Name: 1 CS7071 Advanced Database Management (ADM) : Homework #2 Part 1: Index Structure (32 Points) Assume that you have the following table: Item name age salary Peter 34 30,000 Bob 24 90,000 Jane 18 10,000 Charlie 19 20,000 Chris 17 100,000 Alex 45 0 Sam 46 27,000 Question 1.1 (14 Points). Create a B+-tree for table Item on key age with n = 2 (up to two keys per node). You should start with an empty B+-tree and insert the keys in the order shown in the table above. Write down the resulting B+-tree after each step. When splitting or merging nodes follow these conventions: • Leaf Split: In case a leaf node needs to be split during insertion and n is even, the left node should get the extra key. E.g, if n = 2 and we insert a key 4 into a node [1,5], then the resulting nodes should be [1,4] and [5]. For odd values of n we can always evenly split the keys between the two nodes. In both cases the value inserted into the parent is the smallest value of the right node. • Non-Leaf Split: In case a non-leaf node needs to be split and n is odd, we cannot split the node evenly (one of the new nodes will have one more key). In this case the “middle” value inserted into the parent should be taken from the right node. E.g., if n = 3 and we have to split a non-leaf node [1,3,4,5], the resulting nodes would be [1,3] and [5]. The value inserted into the parent would be 4. • Node Underflow: In case of a node underflow you should first try to redistribute values from a sibling and only if this fails merge the node with one of its siblings. Both approaches should prefer the left sibling. E.g., if we can borrow values from both the left and right sibling, you should borrow from the left one. Problem 1 continued on next page. . . 2 CS7071 Advanced Database Management (ADM) : Homework #2 Problem 1 (continued) Question 1.2 (8 Points). Given is the B+-tree shown below (n = 3 ). Using the conventions for splitting and merging introduced in Question 1.1, execute the following operations and write down the resulting B+- tree after each operation: insert(2),insert(3),insert(4),delete(103) 20 25 100 1 15 19 20 21 23 25 70 103 500 Problem 1 continued on next page. . . 3 CS7071 Advanced Database Management (ADM) : Homework #2 Problem 1 (continued) Question 1.3 (10 Points). Consider the extensible Hash index shown below that is the result of inserting values 1 and 7. Each page holds two keys. Execute the following operations insert(3),insert(6),insert(8),insert(0),delete(1) and write down the resulting index after each operation. Assume the hash function is defined as: x h(x) 0 0000 1 0001 2 1010 3 1010 4 1101 5 0111 6 1110 7 0111 8 1100 0 1 0001 0111 4 CS7071 Advanced Database Management (ADM) : Homework #2 Part 2: Result Size Estimation (28 Points) Consider the tables stock with attributes itemId, suppId, quantity and item with itemId, category, and price. Attribute itemId of relation stock is a foreign key to attribute itemId of relation item. Below are the following statistics: T (stock) = 80, 000 T (item) = 10, 000 V (stock, itemId) = 1, 000 V (item, itemId) = 10, 000 V (stock, suppId) = 100 V (item, category) = 40 V (stock, quantity) = 6, 000 V (item, price) = 1, 000 Question 2.1 (3 Points). Estimate the number of result tuples for the query q = σcategory=‘Electronics‘(item) using the first assumption presented in class (values of the attribute are uniformly distributed). Question 2.2 (3 Points). Estimate the number of result tuples for the query q = σitemId=i1∨suppId=s1(stock) using option1 presented in class. Problem 2 continued on next page. . . 5 CS7071 Advanced Database Management (ADM) : Homework #2 Problem 2 (continued) Question 2.3 (5 Points). Estimate the number of result tuples for the query q = σitemId=i1∨suppId=s1(stock) using the complex estimation. Question 2.4 (4 Points). Estimate the number of result tuples for the query q = σitemId=i3∧price=100(item) using the first assumption presented in class (values used in queries are uniformly distributed over active domain of attribute, i.e., all possible values of the attribute). Problem 2 continued on next page. . . 6 CS7071 Advanced Database Management (ADM) : Homework #2 Problem 2 (continued) Question 2.5 (6 Points). Consider the following min and max values for the columns: min(item, price) = 1 max(item, price) = 500 Estimate the number of result tuples for the query q = σitemId=i3∧price>300(item). Question 2.6 (7 Points). Estimate the number of result tuples for the query q = σquantity=100(item ./ stock) using the first assumption presented in class (values used in queries are uniformly distributed over active domain of attribute, i.e., all possible values of the attribute). 7 CS7071 Advanced Database Management (ADM) : Homework #2 Part 3: I/O Cost Estimation (30 Points) Question 3.1 (3 Points). Consider M = 51 memory pages available and a relation R with B(R) = 150, 000 blocks should be sorted. Compute the number of I/Os necessary to sort R using the external merge sort. Question 3.2 (3 Points). Consider M = 17 memory pages available and a relation R with B(R) = 4, 000, 000 blocks should be sorted. Compute the number of I/Os necessary to sort R using the external merge sort. Problem 3 continued on next page. . . 8 CS7071 Advanced Database Management (ADM) : Homework #2 Problem 3 (continued) Question 3.3 (12 Points). Consider two relations R and S with B(R) = 500, 000 and B(S) = 100, 000 and M = 2, 001 memory pages available. Estimate the minimum number of I/O operations needed to join these two relations using following methods: (1) block-nested-loop join, (2) merge-join (for both sorted and not sorted inputs), and (3) hash-join (not considering hybrid hash-join and another approach using pointer). You can assume that the hash function evenly distributes keys across buckets. Calculate the I/O cost estimation for each join method. Question 3.4 (12 Points). Consider two relations R and S with B(R) = 80, 000 and B(S) = 48, 000 and M = 401 memory pages available. Estimate the minimum number of I/O operations needed to join these two relations using following methods: (1) block-nested-loop join, (2) merge-join (for both sorted and not sorted inputs), and (3) hash-join (not considering hybrid hash-join and another approach using pointer). You can assume that the hash function evenly distributes keys across buckets. Calculate the I/O cost estimation for each join method. 9 CS7071 Advanced Database Management (ADM) : Homework #2 Part 4: Page Replacement Clock (10 Points) Consider a buffer pool with 3 pages using the Clock page replacement strategy. Initially the buffer pool is in the state shown below. Current Buffer State 1[3]∗0 ↓ 1[10]0 0[6]0 1[4]1 We use the following notation flag[page]dirtyfix to denote the state of each buffer frame. page is the page number in the frame, fix is its fix count, dirty is indicating with an Asterix that the page is dirty, and flag is the reference bit used by the Clock algorithm. For example, 1[5]∗2 denotes that the frame stores page 5 with a fix count 2, that the page is dirty, and that the reference bit is set to 1. Recall that Clock uses a pointer S that points to the current page frame (the one to be checked for replacement next). The page frame S is pointing to is indicated by ↓. In your solution, use an arrow to indicate the page frame that S is pointing to. Execute the following requests and write down state of the buffer pool after each request. • p stands for pin • u for unpin • d for marking a page as dirty p(10),u(10),p(6),u(4),p(5) 10