its in the file
Newspaper,"Daily Circulation, 2004","Daily Circulation, 2013","Change in Daily Circulation, 2004-2013","Pulitzer Prize Winners and Finalists, 1990-2003","Pulitzer Prize Winners and Finalists, 2004-2014","Pulitzer Prize Winners and Finalists, 1990-2014" USA Today,2192098,1674306,-0.24,1,1,2 Wall Street Journal,2101017,2378827,0.13,30,20,50 New York Times,1119027,1865318,0.67,55,62,117 Los Angeles Times,983727,653868,-0.34,44,41,85 Washington Post,760034,474767,-0.38,52,48,100 New York Daily News,712671,516165,-0.28,4,2,6 New York Post,642844,500521,-0.22,0,0,0 Chicago Tribune,603315,414930,-0.31,23,15,38 San Jose Mercury News,558874,583998,0.04,4,2,6 Newsday,553117,377744,-0.32,12,6,18 Houston Chronicle,549300,360251,-0.34,2,3,5 Dallas Morning News,528379,409265,-0.23,11,6,17 San Francisco Chronicle,499008,218987,-0.56,7,2,9 Arizona Republic,466926,293640,-0.37,5,2,7 Chicago Sun-Times,453757,470548,0.04,1,1,2 Boston Globe,446241,245572,-0.45,25,16,41 Atlanta Journal Constitution,409873,231094,-0.44,1,5,6 Newark Star Ledger,395000,340778,-0.14,2,6,8 Detroit Free Press,379304,209652,-0.45,7,5,12 Minneapolis Star Tribune,377058,301345,-0.2,4,4,8 Philadelphia Inquirer,376454,306831,-0.18,24,8,32 Cleveland Plain Dealer,367528,311605,-0.15,4,7,11 San Diego Union-Tribune,355771,250678,-0.3,0,2,2 Tampa Bay Times,348502,340260,-0.02,10,11,21 Denver Post,340168,416676,0.22,1,8,9 Rocky Mountain News,340007,0,-1,4,1,5 Oregonian,339169,228909,-0.33,9,8,17 Miami Herald,325032,147130,-0.55,17,7,24 Orange County Register,310001,356165,0.15,3,2,5 Sacramento Bee,303841,200802,-0.34,4,4,8 St. Louis Post-Dispatch,281198,167199,-0.41,4,3,7 Baltimore Sun,277947,177054,-0.36,11,2,13 Kansas City Star,275747,189283,-0.31,2,0,2 Detroit News,271465,115643,-0.57,4,0,4 Orlando Sentinel,269269,161070,-0.4,5,2,7 South Florida Sun-Sentinel,268297,163728,-0.39,0,1,1 New Orleans Times-Picayune,262008,0,-1,5,3,8 Columbus Dispatch,259127,137148,-0.47,1,0,1 Indianapolis Star,253778,156850,-0.38,1,0,1 San Antonio Express-News,246057,139005,-0.44,0,0,0 Pittsburgh Post-Gazette,242514,180433,-0.26,3,0,3 Milwaukee Journal Sentinel,241605,198469,-0.18,2,8,10 Tampa Tribune,238877,191477,-0.2,0,0,0 Fort Woth Star-Telegram,237318,188593,-0.21,1,0,1 Boston Herald,236899,95929,-0.6,0,0,0 Seattle Times,233497,229764,-0.02,11,5,16 Charlotte Observer,231369,137829,-0.4,1,3,4 Daily Oklahoman,223403,124667,-0.44,0,0,0 Louisville Courier-Journal,216934,131208,-0.4,0,3,3 Investor's Buisiness Daily,215735,157161,-0.27,0,1,1 --- title: "R Project 2" author: "YOUR NAME" date: "DATA 2401" output: html_document: default --- Note that these exercises should be performed using `dplyr` (do not directly access or manipulate the data frames). Turn in the html to blackboard with the filename `RProject2_yourlastname.html`. **Points will be taken off for not doing this.** ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Part 1 Install and load the "fueleconomy" package and the tidyverse ```{r} # run these only once # install.packages("devtools") # devtools::install_github("hadley/fueleconomy") ``` ```{r} library(tidyverse) library(fueleconomy) ``` 1. Select the different manufacturers (makes) of the cars in this data set. Save this vector in a variable 2. Use the `distinct()` function to determine how many different car manufacturers are represented by the data set 3. Filter the data set for vehicles manufactured in 1997 4. Arrange the 1997 cars by highway (`hwy`) gas milage 5. Mutate the 1997 cars data frame to add a column `average` that has the average gas milage (between city and highway mpg) for each car 6. Filter the whole vehicles data set for 2-Wheel Drive vehicles that get more than 20 miles/gallon in the city. Save this new data frame in a variable. 7. Of the above vehicles, what is the vehicle ID of the vehicle with the worst hwy mpg? Hint: filter for the worst vehicle, then select its ID. 8. Write a function that takes a `year_choice` and a `make_choice` as parameters, and returns the vehicle model that gets the most hwy miles/gallon of vehicles of that make in that year. You'll need to filter more (and do some selecting)! 9. What was the most efficient Honda model of 1995? 10. Which 2015 Acura model has the best hwy MGH? (Use dplyr, but without method chaining or pipes--use temporary variables!) 11. Which 2015 Acura model has the best hwy MPG? (Use dplyr, nesting functions) 12. Which 2015 Acura model has the best hwy MPG? (Use dplyr and the pipe operator) ## Bonus Write 3 functions, one for each approach in 10, 11 and 12. Then, test how long it takes to perform each one 1000 times # Part 2 Read in the data (from `pulitzer-circulation-data.csv`). Remember to not treat strings as factors! 1. View in the data set. Start to understand what the data set contains 2. Print out the names of the columns for reference 3. Use the 'str()' function to also see what types of values are contained in each column (you're looking at the second column after the `:`) Did any value type surprise you? Why do you think they are that type? 4. Add a column to the data frame called 'Pulitzer.Prize.Change` that contains the difference in the number of times each paper was a winner or finalist (hereafter we'll call this group "winner") during 2004-2014 and during 1990-2003 5. What was the name of the publication that has the most winners between 2004-2014? 6. Which publication with at least 5 winners between 2004-2014 had the biggest decrease(negative) in daily circulation numbers? 7. An important part about being a data scientist is asking questions. Write a question you may be interested in about this data set, and then use dplyr to figure out the answer!