quote this please
CAP4770 – Module 2 Assignment Objective The purpose of the assignment is to help students acquire skills in working with (1) methods used to understand important features of the application data (2) methods used to preprocess data to improve its quality. Assignment Questions 1. (This exercise is a variation of Exercise 2.2 in Chapter 2 of the textbook) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order): 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 (a) Complete the table below (round results to two decimal places): Mean Mode Midrange Minimum First Quartile Median Third quartile Maximum (b) Construct a boxplot of the data 2. (Exercise 2.6 in Chapter 2 of the textbook) Consider two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8). Complete the table below; round results to two decimal places: Euclidian Distance Manhattan Distance Minkowski Distance Supremum Distance 3. (This exercise is a variation of Exercise 2.8 in Chapter 2 of the textbook) Consider the data as 2-D data points. Given a new data point x = (1.4, 1.6) as a query, rank the database points based on the cosine similarity measure. A1 A2 x1 1.5 1.7 x2 2 1.9 x3 1.6 1.8 x4 1.2 1.5 x5 1.5 1 In other words, find the cosine similarity for each data point and sort in decreasing order: Similarity Vector Closest Second closest Third closest Fourth closest Farthest 4. (This exercise is a variation of Exercise 3.3 in Chapter 3 of the textbook) Using the data set given in Exercise 1 of this assignment, use smoothing by bin means to smooth this data, using a bin depth of 3. Round results to two decimal places. Bins Smoothed by Bin Means 5. (This exercise is a variation of Exercise 3.7 in Chapter 3 of the textbook) Using the data set given in Exercise 1 of this assignment, use min-max, z-score, and decimal scaling normalizations to transform the value 35 (use [0, 1] as the new range for the min-max normalization and 12.94 as the standard deviation for the z-score normalization). Round results to three decimal places. Normalization Normalized Value min-max z-score decimal scaling Guidelines The assignment is to be completed individually. Questions are based on Module 2 readings. Deliverables: 1) This document with the answers entered in the table or space of each question. 2) The document (or documents) with the details of how the solutions were obtained (do not work the problems out in this document but in a separate one). Submit Excel spreadsheet(s) if Excel was used for the calculations. Name your files
and. If there are more than one worksheet simply add a number to the name. For example, Smith_John_Module2_Assignment.docx Smith_John_Module2_Worksheet1.docx Smith_John_Module2_Worksheet2.xlsx Create a folder, name it Module 2 Assignment and place your files in there. Compress it and drop the resulting zipped folder into the Dropbox. Make sure you write your full name, Panther ID, date, and your class section in the first lines of each document, in that order: Student Name: __________ Panther ID: __________ Date: __________ Section: __________ Grading Rubric Question 1 is worth 2 points and questions 2 – 5 are worth 1 point each. Each question will be graded based on correctness, completion, and organization.