Textbook: Textbook information Data Mining: Concepts and Techniques (Required) Jiawei Han, Micheline Kamber, Jian Pei Morgan Kaufmann; 3rd Edition; 2011 ISBN-10: XXXXXXXXXX ISBN-13: XXXXXXXXXXPrint) |...

1 answer below »
quote


Textbook: Textbook information Data Mining: Concepts and Techniques (Required) Jiawei Han, Micheline Kamber, Jian Pei Morgan Kaufmann; 3rd Edition; 2011 ISBN-10: 0123814790 ISBN-13: 978-0-12-381479-1 (Print)   |   978-0-12-381480-7 (eBook) Assignment: Please consider the following questions for the discussions in the 4 areas. Note that there are several questions in each area, and while I recommend that you read all of the posts made in each area, there is no need for each student to provide answers to all of the questions. The requirement is one quality post in each of the 4 areas. Area 1: Statistical Measures Note: consult sections 2.1, 2.2 and 2.3 of our textbook for this area. - Read carefully each of these statements and discuss whether they are true or false. Why? (again, you don’t need to explain them all; you can pick up just one and base your post on it) · The mean is in general affected by outliers. · Not all numerical data sets have a median. · The mode is the only measure of central tendency that can be used for nominal attributes. - What are the differences between the measures of central tendency and the measures of dispersion? - How would you catalog a boxplot, as a measure of dispersion or as a data visualization aid? Why? Area 2: Similarity and Dissimilarity Measures Note: consult section 2.4 of our textbook for this area. - What do we understand by similarity measure and what is its importance? - What do we understand by dissimilarity measure and what is its importance? - Discuss one of the distance measures that are commonly used for computing the dissimilarity of objects described by numeric attributes. - In many real-life databases, objects are described by a mixture of attribute types. How can we compute the dissimilarity between objects of mixed attribute types? Area 3: Data Quality Note: consult sections 3.1, 3.2, and 3.3 of our textbook for this area. - What do we understand by data quality and what is its importance? - Discuss one of the factors comprising data quality and provide examples. - How can the data be preprocessed in order to help improve its quality? - Please discuss the meaning of noise in data sets and the methods that can be used to remove the noise (smooth out the data). - Why is data integration necessary? What are some of the challenges to consider and the techniques employed in data integration? Area 4: Data Transformation Note: consult sections 3.4 and 3.5 of our textbook for this area. - What do we understand by data reduction and what is its importance? - Discuss one of the data reduction strategies. - Discuss one of the data transformation strategies. - What do we understand by data normalization? What are some of its methods?
Answered Same DaySep 18, 2021

Answer To: Textbook: Textbook information Data Mining: Concepts and Techniques (Required) Jiawei Han, Micheline...

Vignesh answered on Sep 18 2021
150 Votes
Textbook:
    Textbook information
    
    Data Mining: Concepts and Techniques (Required)
Jiawei Han, Micheline Kamber
, Jian Pei
Morgan Kaufmann; 3rd Edition; 2011
ISBN-10: 0123814790
ISBN-13: 978-0-12-381479-1 (Print)   |   978-0-12-381480-7 (eBook)
Assignment:
Please consider the following questions for the discussions in the 4 areas. Note that there are several questions in each area, and while I recommend that you read all of the posts made in each area, there is no need for each student to provide answers to all of the questions. The requirement is one quality post in each of the 4 areas.
Area 1: Statistical Measures
Note: consult sections 2.1, 2.2 and 2.3 of our textbook for this area.
- Read carefully each of these statements and discuss whether they are true or false. Why? (again, you don’t need to explain them all; you can pick up just one and base your post on it)
· The mean is in general affected by outliers.
· Not all numerical data sets have a median.
· The mode is the only measure of central tendency that can be used for nominal attributes.
- What are the differences between the measures of central tendency...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here