For this assignment, you will use the following three data sets. US_airlines.csv, US_airports.csv, US_airrecord.csv Using R you will prepare and explore the datasets using data cleaning and analysis...

1 answer below »

For this assignment, you will use the following three data sets. US_airlines.csv, US_airports.csv, US_airrecord.csv


Using R you will prepare and explore the datasets using data cleaning and analysis techniques and will discuss the discovered trends and points of interest.


Steps you will complete should include:


Inspect and summarise your data.


Clean and combine datasets where appropriate


· Check for and handle missing values


· Remove any unnecessary variables


· Transform any variables that you would like to use in a different form


Plot data and identify trends and/or points of interest


Perform data analysis to investigate


· The airlines which experience the most delays


· The busiest routes


· The relationship between distance between airports and flying time


Predicting flying time based on distance


Discuss your findings, comment your code and prepare explanatory visualisations.
Some observations i have made




·
The times are as per the 24 hour clock so 10 is 00.10 and 1542 is 15.42.


·
Any time differences are in minutes.


·
There are 19 flights have a wheels off time but have a cancellation reason what do we do with them. All reasons relate to the weather and airline.


·
TAIL_NUMBER – One value needs an N put in front of it 7819A


·
Elapsed time NA values need to be calculated by AIR_TIME+TAXI_IN+TAXI_OUT however first the NA values in AIR_TIME need to be replaced with a calculation of time duration between WHEELS_OFF and WHEELS_ON


·
Need to add relevant data from

US_Airlines.csv and US_airports.csv using the IATA_COD


·
The NA values for these variables below relate to where there was no delay. The 0 values relate to where there was a delay but not for that reason. Values other than these signify how long of a delay there was for each reason. Some delays can be for more than one reason. Eg Air system and airline delay. These need some transformation.


o
AIR_SYSTEM_DELAY


o
SECURITY_DELAY


o
AIRLINE_DELAY


o
LATE_AIRCRAFT_DELAY


o
WEATHER_DELAY

Answered Same DayFeb 15, 2021

Answer To: For this assignment, you will use the following three data sets. US_airlines.csv, US_airports.csv,...

Rohith answered on Feb 18 2021
137 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here