Answer To: Using RStudioUse the flight data and answer the following questions (remove any canceled or diverted...
Saravana answered on May 29 2021
Assignement
Assignement
5/29/2021
Assingment on R/Rstudio 1. Which airline had the most and least departures between 01May19 and 15May19? Data Preparation
Read the csv data files
Data analysis using dplyr and tidyverse
Plot using GGPlot
2. Which airline had the most and least departures between 01May20 and 15May20? Data analysis using dplyr and tidyverse
Plot using GGPlot
3. Write a function to find the area of the intersection of three circle
Assingment on R/Rstudio
1. Which airline had the most and least departures between 01May19 and 15May19?
Data Preparation
Read the csv data files
setwd("/media/priyan/Files/GreyNodes/Assignment19/r-homework-cev0ww53")
files = list.files(pattern = '*.csv')
data_May19 = read.table(files[1], fill = TRUE,header = TRUE,sep= ",",stringsAsFactors = TRUE)
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## EOF within quoted string
data_May20 = read.table(files[2], fill = TRUE,header = TRUE,sep= ",",stringsAsFactors = TRUE)
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## EOF within quoted string
Data analysis using dplyr and tidyverse
# remove the cancelled flights
data_May19$CANCELLED <- as.factor(as.character(data_May19$CANCELLED)) # data entry 1 is considered cancelled
data_May19$DIVERTED <- as.factor(as.character(data_May19$DIVERTED)) # data entry 1 is considered Diverted
# the dates that need to be included in the analysis are only kept from 01May19 to 15May19
dates <- c("2019-05-01","2019-05-02","2019-05-03", "2019-05-04", "2019-05-05","2019-05-06", "2019-05-07", "2019-05-08", "2019-05-09", "2019-05-10","2019-05-11", "2019-05-12", "2019-05-13", "2019-05-14", "2019-05-15")
data_May19_count <- data_May19 %>%
dplyr :: filter(CANCELLED != 1 & DIVERTED != 1)%>% # cancelled and diverted flights were removed
dplyr :: filter(FL_DATE %in% dates)%>% # Rows with Flight dates in "dates" vector is filtered
group_by(airCarrier) %>% # group by "aircarriers"
summarize(NFlights = n()) # use summarize to get the count through n() fucntion
# arrange the dataframe in increasing order
data_May19_count<-data_May19_count[order(data_May19_count$NFlights),]
print(data_May19_count)
## # A tibble: 17 x 2
## airCarrier NFlights
##
## 1 Hawaiian Airlines Inc. 1356
## 2 Allegiant Air 1793
## 3 ExpressJet Airlines LLC 2247
## 4 Frontier Airlines Inc. 3279
## 5 Spirit Air Lines 4371
## 6 PSA Airlines Inc. 4594
## 7 Mesa Airlines Inc. 4722
## 8 Endeavor Air Inc. 5039
## 9 Alaska Airlines Inc. 5400
## 10 JetBlue Airways 6139
## 11 Envoy Air 6988
## 12 Republic Airline 7341
## 13 United Air Lines Inc. 12891
## 14 SkyWest Airlines Inc. 16596
## 15 American Airlines Inc. 18836
## 16 Delta Air Lines Inc. 20994
## 17 Southwest Airlines Co. 23951
Plot using GGPlot
# load the library
library(forcats)
p <- data_May19_count %>%
...