Please look at the attachments.
CIS 3306 - Data Visualization Summer 2020 Exam # 1 – Part 2 Due: July 3rd mid night (11:59 PM) No late submits allowed Maximum points: 20 or 20% of the course grade Note: Each question is worth 1 point. You must include R code along with the output graph(s) for your answer. The simplest way is to copy the R code along with the plot within a word document and submit it. You must use the ggplot function for the ggplot2 package to write the R code for visualizing data. Your work must involve using correct variables for the dataset of your choice to plot meaningful visualizations. Just producing visual plots which are incorrect will not receive any credit. Partial credit will be assigned for work demonstrating significant efforts in the right direction. You are allowed to use the code from the text book. 1. How do you install a package in R environment? Write the R command as an example. You may use any package of your choice to demonstrate the installation of a package. 2. How do you call a package within your current R session? You may use any installed package of your choice to demonstrate this. 3. How do you load data from an excel file (with xlsx or csv extension) in R environment? Demonstrate by using any excel file on your local computer and loading it into the R environment. 4. Plot a scatter plot with numerical or quantitative variable on the y-axis and categorical or qualitative variable on the x-axis. You may use any dataset within the R environment. You must use the ggplot function from the ggplot2 package. Why such a plot with categorical variable on the x-axis and quantitative variable on the y-axis are not very useful. Which plot is used to plot categorical variable on the x-axis and numerical variable on the y-axis? 5. Now plot a scatter plot with numerical or quantitative variable on both the x-axis as well as the y-axis. You may use any dataset within the R environment. You must use the ggplot function from the ggplot2 package. 6. Plot a simple line graph using any dataset available within the R environment. You must use the ggplot function from the ggplot2 package. 7. Which plot is used to plot a bell curve or normal curve? Plot a bell curve or normal curve using any dataset present within the R environment. You must use the ggplot function from the ggplot2 package. 8. Write the R code to create the following boxplot for the mpg dataset. This dataset is available within the R environment: Now interpret the above box plot as to what insights can be drawn from it. 9. Write the R code to create the following bar plot for the mtcars dataset. This dataset is available within the R environment: 10. Write the R code to create the following grouped bar plot for the mpg dataset. This dataset is available within the R environment: Why there’s no multiple bars for cylinder = 5 in the above bar plot? 11. How do you change the default colors for the bars in question 10 above? Write the R command to change the default colors to red, green and blue respectively for the three drv levels in the above plot corresponding to question 10. Your plot must look as shown below: 12. Which plot will you use to compare the diamond prices across categories of diamonds based on their cut quality for the diamonds dataset? This dataset is available within the R environment. (Hint: use cut on the x-axis and price on the y-axis) 13. Write R code to create the following plot for the cabbage_exp dataset from the gccokbook package. 14. Write the R code to create the following stacked bar plot for the mpg dataset. This dataset is available within the R environment. 15. Write the R code to create the following proportional stacked bar graph for the mpg dataset. This dataset is available within the R environment. 16. Demonstrate the use of xlim() and ylim() functions for the line plot for any dataset available within the R environment. Write the corresponding R command using ggplot function of the ggplot2 package. Also show your output graph. 17. Write the R code to create the following multi-line line plot for the Orange dataset. This dataset is available within the R environment. 18. Demonstrate the use of size, shape, color and fill attributes of geom_point() within a multiline line graph for any dataset available within the R environment. 19. Demonstrate the use of linetype, size and color attributes of geom_line() within a multiline line graph for any dataset available within the R environment. 20. Demonstrate the use of geom_area() to plot stacked area or proportional stacked area graph for any meaningful dataset which can be plotted using geom_area() geometric method within the R environment. storytelling with data: a data visualization guide for business professionals storytelling with data storytelling with data a data visualization guide for business professionals cole nussbaumer knaflic Cover image: Cole Nussbaumer Knaflic Cover design: Wiley Copyright © 2015 by Cole Nussbaumer Knaflic. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748- 6008, or online at www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762- 2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Cataloging-in-Publication Data: ISBN 9781119002253 (Paperback) ISBN 9781119002260 (ePDF) ISBN 9781119002062 (ePub) Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 http://www.copyright.com http://www.wiley.com/go/permissions http://booksupport.wiley.com http://www.wiley.com To Randolph vii contents foreword ix acknowledgments xi about the author xiii introduction 1 chapter 1 the importance of context 19 chapter 2 choosing an effective visual 35 chapter 3 clutter is your enemy! 71 chapter 4 focus your audience’s attention 99 chapter 5 think like a designer 127 chapter 6 dissecting model visuals 151 chapter 7 lessons in storytelling 165 chapter 8 pulling it all together 187 chapter 9 case studies 207 chapter 10 final thoughts 241 bibliography 257 index 261 ix foreword “Power Corrupts. PowerPoint Corrupts Absolutely.” —Edward Tufte, Yale Professor Emeritus1 We’ve all been victims of bad slideware. Hit‐and‐run presentations that leave us staggering from a maelstrom of fonts, colors, bullets, and highlights. Infographics that fail to be informative and are only graphic in the same sense that violence can be graphic. Charts and tables in the press that mislead and confuse. It’s too easy today to generate tables, charts, graphs. I can imagine some old‐timer (maybe it’s me?) harrumphing over my shoulder that in his day they’d do illustrations by hand, which meant you had to think before committing pen to paper. Having all the information in the world at our fingertips doesn’t make it easier to communicate: it makes it harder. The more information you’re dealing with, the more difficult it is to filter down to the most important bits. Enter Cole Nussbaumer Knaflic. I met Cole in late 2007. I’d been recruited by Google the year before to create the “People Operations” team, responsible for finding, keep- ing, and delighting the folks at Google. Shortly after joining I decided 1 Tufte, Edward R. ‘PowerPoint Is Evil.’ Wired Magazine, www.wired.com/wired/ archive/11.09/ppt2.html, September 2003. http://www.wired.com/wired/archive/11.09/ppt2.html http://www.wired.com/wired/archive/11.09/ppt2.html x foreword we needed a People Analytics team, with a mandate to make sure we innovated as much on the people side as we did on the product side. Cole became an early and critical member of that team, acting as a conduit between the Analytics team and other parts of Google. Cole always had a knack for clarity. She was given some of our messiest messages—such as what exactly makes one manager great and another crummy—and distilled them into crisp, pleasing imagery that told an irrefutable story. Her messages of “don’t be a data fashion victim” (i.e., lose the fancy clipart, graphics and fonts—focus on the message) and “simple beats sexy” (i.e., the point is to clearly tell a story, not to make a pretty chart)