IS 603 Homework 3, Due Sunday 3/28/ XXXXXXXXXX:59pm) Points 1. For this question, consider the Weather dataset in the table below, where the goal is to predict whether we will play an outdoor sports...

1 answer below »
I attached the homework assignment and textbook for reference if needed.


IS 603 Homework 3, Due Sunday 3/28/2021 (11:59pm) Points 1. For this question, consider the Weather dataset in the table below, where the goal is to predict whether we will play an outdoor sports game on a particular day (yes or no). The class attribute is the last column, Play. The Day ID attribute is not used for the predictions. As we proceed through this question, we will work toward predicting the class label for Day F using the nearest neighbor classifier. Day ID Temperature Humidity Wind Speed Play (class) A 85 85 5 no B 80 90 25 no C 83 86 10 yes D 70 96 7 yes E 68 80 3 yes F 65 70 20 ? (a) Briefly explain (in around 2-3 sentences, your own words) the main conceptual idea of the nearest neighbor classifier, and how it could be used to predict the class label for the Weather problem. (1) (b) Your task for this sub-question is to calculate the Euclidean distances between each of the data points (i.e. days) from Day F. For this question, ignore the Day ID when calculating distances. First, let’s calculate the distance for Day A together. Recall that the general formula for Euclidean distance between points A = [d1,A, d2,A, . . . , dn,A] and B = [d1,B , d2,B , . . . , dn,B ] is: d(A,B) = √ (d1,A − d1,B)2 + (d2,A − d2,B)2 + . . . + (dn,A − dn,B)2 . Guided example for Day A: We want to calculate the Euclidean distance between A = [85, 85, 5] and F = [65, 70, 20]. Note, we did not include either the Day ID or the class attribute here. Plugging these numbers into the formula, we must calculate d(A,F ) = √ (85− 65)2 + (85− 70)2 + (5− 20)2. Open wolframalpha.com, which we will use as a calculator. At the prompt, enter: sqrt((85 - 65)^2 + (85 - 70)^2 + (5 - 20)^2). Read off the first few decimal places of the answer under decimal approximation, confirming that you get 29.15. Using similar calculations, please provide the distances for Days B, C, D, and E. (4) (c) Based on the distances you calculated, which day is the nearest neighbor for Day F? Hence, what is the predicted class label for Day F according to the nearest neighbor classifier? (2) (d) Based on the distances you calculated, which are the three nearest neighbors? Hence, what is the predicted class label for Day F according to the K-nearest neighbor classifier, where K = 3? (2) Homework continues overleaf. 2. For this question, suppose you have a larger version of the Weather dataset from Question 1, which contains approximately one year’s worth of data. The goal of this question will be to evaluate the performance of the nearest neighbor classifier and K-nearest neighbor classifiers discussed in Question 1 in the context of this version of the Weather problem. (a) Briefly explain (in 2-3 sentences, your own words) the concept of overfitting. (1) (b) Briefly explain (in 2-3 sentences, your own words) why it would not be appropriate to evaluate the performance of the nearest neighbor classifiers by simply measuring the accuracy of the models on the training dataset. Be sure to use the concept of overfitting in your answer. (1) (c) Briefly explain (in 2-3 sentences, your own words) the concept of a hold-out set, and how it can help to address the problem you described in your answer to Question 2b. (1) (d) Suppose we train a K-nearest neighbor classifier on the training data for the Weather problem with different several different values of the number of nearest neighbors, K = {1, 3, 5, 7, 9, 11}. We select the best value of K which performs the best on the hold-out set. Going forward, we plan to use the model with the best value of K to predict the class labels (Play) on each new day from now on. Briefly explain (in 2-3 sentences, your own words) why the accuracy of the model, estimated based on the hold-out set, may be an over-estimate of the model’s true accuracy on future days. (1) (e) Briefly explain (in 2-3 sentences, your own words), the concept of a validation set, and how it can help to address the problem you described in your answer to Question 2d. (1) (f) Briefly explain (in 2-3 sentences, your own words) a possible use-case/application of clustering methods such as K-means to the Weather problem. In your answer, be sure to explain what the output of the clustering method would be used for. Feel free to make any assumptions you want to about the broader context of the scenario and who is using the model (e.g. which particular sport is being played and whether it is professional or amateur), though it would be best to explain any assumptions you make. (1) Total: 15 Data Science for Business Praise “A must-read resource for anyone who is serious about embracing the opportunity of big data.” — Craig Vaughan Global Vice President at SAP “This timely book says out loud what has finally become apparent: in the modern world, Data is Business, and you can no longer think business without thinking data. Read this book and you will understand the Science behind thinking data.” — Ron Bekkerman Chief Data Officer at Carmel Ventures “A great book for business managers who lead or interact with data scientists, who wish to better understand the principles and algorithms available without the technical details of single-disciplinary books.” — Ronny Kohavi Partner Architect at Microsoft Online Services Division “Provost and Fawcett have distilled their mastery of both the art and science of real-world data analysis into an unrivalled introduction to the field.” — Geoff Webb Editor-in-Chief of Data Mining and Knowledge Discovery Journal “I would love it if everyone I had to work with had read this book.” — Claudia Perlich Chief Scientist of Dstillery and Advertising Research Foundation Innovation Award Grand Winner (2013) “A foundational piece in the fast developing world of Data Science. A must read for anyone interested in the Big Data revolution." — Justin Gapper Business Unit Analytics Manager at Teledyne Scientific and Imaging “The authors, both renowned experts in data science before it had a name, have taken a complex topic and made it accessible to all levels, but mostly helpful to the budding data scientist. As far as I know, this is the first book of its kind—with a focus on data science concepts as applied to practical business problems. It is liberally sprinkled with compelling real-world examples outlining familiar, accessible problems in the business world: customer churn, targeted marking, even whiskey analytics! The book is unique in that it does not give a cookbook of algorithms, rather it helps the reader understand the underlying concepts behind data science, and most importantly how to approach and be successful at problem solving. Whether you are looking for a good comprehensive overview of data science or are a budding data scientist in need of the basics, this is a must-read.” — Chris Volinsky Director of Statistics Research at AT&T Labs and Winning Team Member for the $1 Million Netflix Challenge “This book goes beyond data analytics 101. It’s the essential guide for those of us (all of us?) whose businesses are built on the ubiquity of data opportunities and the new mandate for data-driven decision-making.” — Tom Phillips CEO of Dstillery and Former Head of Google Search and Analytics “Intelligent use of data has become a force powering business to new levels of competitiveness. To thrive in this data-driven ecosystem, engineers, analysts, and managers alike must understand the options, design choices, and tradeoffs before them. With motivating examples, clear exposition, and a breadth of details covering not only the “hows” but the “whys”, Data Science for Business is the perfect primer for those wishing to become involved in the development and application of data-driven systems.” — Josh Attenberg Data Science Lead at Etsy “Data is the foundation of new waves of productivity growth, innovation, and richer customer insight. Only recently viewed broadly as a source of competitive advantage, dealing well with data is rapidly becoming table stakes to stay in the game. The authors’ deep applied experience makes this a must read—a window into your competitor’s strategy.” — Alan Murray Serial Entrepreneur; Partner at Coriolis Ventures “One of the best data mining books, which helped me think through various ideas on liquidity analysis in the FX business. The examples are excellent and help you take a deep dive into the subject! This one is going to be on my shelf for lifetime!” — Nidhi Kathuria Vice President of FX at Royal Bank of Scotland “An excellent and accessible primer to help businessfolk better appreciate the concepts, tools and techniques employed by data scientists… and for data scientists to better appreciate the business context in which their solutions are deployed.” — Joe McCarthy Director of Analytics and Data Science at Atigeo “In my opinion it is the best book on Data Science and Big Data for a professional understanding by business analysts and managers who must apply these techniques in the practical world.” — Ira Laefsky MS Engineering (Computer Science)/MBA Information Technology and Human Computer Interaction Researcher formerly on the Senior Consulting Staff of Arthur D. Little, Inc. and Digital Equipment Corporation “With motivating examples, clear exposition and a breadth of details covering not only the “hows” but the “whys,” Data Science for Business is the perfect primer for those wishing to become involved in the development and application of data driven systems.” — Ted O’Brien Co-Founder / Director of Talent Acquisition at Starbridge Partners and Publisher of the Data Science Report Foster Provost and Tom Fawcett Data Science for Business Data Science for Business by Foster Provost and Tom Fawcett Copyright © 2013 Foster Provost and Tom Fawcett. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editors: Mike Loukides and Meghan Blanchette Production Editor: Christopher Hearse Proofreader: Kiel Van Horn Indexer: WordCo Indexing Services, Inc. Cover Designer: Mark Paglietti Interior Designer: David Futato Illustrator: Rebecca Demarest July 2013: First Edition Revision History for the First Edition: 2013-07-25: First release 2013-12-19: Second release See http://oreilly.com/catalog/errata.csp?isbn=9781449361327 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Many of the designations used by man‐ ufacturers and sellers to distinguish their products are claimed
Answered 2 days AfterMar 29, 2021

Answer To: IS 603 Homework 3, Due Sunday 3/28/ XXXXXXXXXX:59pm) Points 1. For this question, consider the...

Sandeep Kumar answered on Mar 31 2021
147 Votes
1 (a)Nearest neighbor classifier is a popular machine learning classifier, it is founded on the idea that similar objects flock together, so the label of any new object in the dataset can be calculated through the proximity of it with labelled data. Suppose A is near B and B’s label is blue, then A’s label will also be blue due to association. Since in the weather dataset, DAY ID can be categorized as class Play- Yes or No. Hence if the new Day ID is near the category no than yes, it’s class label will also be no.
(b) Euclidean distances
d(B,F) =
d(C,F) =
d(D,F) =
d(E,F) =
(c) From the calculated Euclidean distances, E is nearest to F with distance of 19.94 units. SO predicted class label for day F is ‘yes’
(d) With K=3, the nearest neighbors of F are E, B, C. As C, E are labelled ‘yes’ and B is labelled ‘no’. With majority rule, the label of F is ‘yes’
2) (a)...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here