Check the file
--- title: "Stat 405/705 HW 5" author: "James Johndrow" output: html_document: default pdf_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) if (!require('pacman')) {install.packages('pacman')} rm(list=ls(all=T)) ``` ## Directions If in a question I refer to a function that we have not seen in class, then use the help facility in R or Google to find out about it. Insert your answers and the R code you used to generate them, beneath each question. If there is no R code chunk present, create one. **Submit your solutions to canvas -- both the R markdown file and the html file. I strongly suggest checking as you go that your file knits. If you cannot knit, submit just the Rmd. ** This assignment will focus on computing various aspects of the operations of a charter air cargo business with stochastic order origination. Your business owns 30 planes, all of which are identical and travel at identical speeds (800 km/h; you may neglect variation in speed during takeoff and landing). The way your business works is that one or several customers charter a cargo flight to bring their goods from one city to another. As such, a major aspect of your operation is to optimize the movement of your cargo planes from their current location to airports where an order is waiting to be picked up. We will use simulations to compute things like the average time it takes to complete a delivery, including the time it takes to fly from the current location to the airport of origin for the shipment, the time to load the shipment, the time it takes to fly to the destination city, and the time it takes to unload the shipment. # Question 1: In this question we'll assemble a dataset you need. ## Part a (5 pts) There are two data files on canvas: airport.csv and airport-codes.csv. Read them both in using read.csv as dataframes `cargo` and `locs`, respectively. ```{r} ``` ## Part b (5 pts) Cargo contains data on the 50 busiest cargo airports in the US. For the purposes of this exercise we'll assume that you only fly between these airports. locs contains information on the geolocation (lat,lon) of most airports worldwide. We only need this latter dataset to get the geolocations of the 50 airports where we fly into our airport data frame. Use `merge` to add a column to cargo that contains the geolocation of each airport. You can use the IATA code as key variable. ```{r} ``` ## Part c (10 pts) Something has gone slightly wrong with your merge. One of the observations has been duplicated. Identify the airport that has been duplicated. Now print the iata code and the coordinates of the two entries for this airport, and remedy the problem by removing the observation with the larger value of the latitude coordinate (you can do this "by inspection"). ```{r} ``` Now explain in two sentences why this duplication occurred and how you know. **Answer**: ## Part d (5 pts) Coerce the variable iata_code in cargo to character. Sort the resulting data frame by rank (the rank, in ascending order, of the airport in terms of the total number of tons of cargo that pass through it each year), and print the top 5 (i.e. these are the 5 busiest) ```{r} ``` # Question 2 Next we're going to create a function to compute distances between airports. ## Part a (10 pts) We are going to need to work with the lat, lon in the coordinates variable; however, your current coordinates variable is a factor. Create separate variables lat and lon by first converting the factor to a character, splitting on comma, and then converting the now separated string variables to numeric. ```{r} ``` ## Part b (5 pts) The distance between two points on the surface of the earth is called the "geodesic distance" or "great circle distance." In a moment you'll be given a formula for this distance, but the formula will require that you express the geographic longitude and latitude in **radians**. Recall that \(\pi\) radians is equal to 180 degrees. You current latitude and longitude are in degrees. Convert them to radians ```{r} ``` ## Part c (10 pts) Now, write a function that takes two inputs, both n by 2 matrices whose rows are (lat,lon) pairs, and computes the geodesic distance \(d\) between each pair of locations using the formula \(d = r \Delta \sigma \), where for \(\lambda_1,\phi_1\) the lat,lon of the first location, \(\lambda_2,\phi_2\) the lat,lon of the second location, and \(r\) the radius of the earth, approximately 6378 km, we have \[ \Delta \sigma = \arccos(\cos \lambda_1 \cos \lambda_2 \cos(\phi_1-\phi_2) + \sin \lambda_1 \sin \lambda_2 ). \] Name the function gcd, and print the gcd between MEM and ANC. Note: it is ok here to hard code the radius of the earth, since we won't be computing geodesic distances on Mars. ```{r} ``` ## Part d (15 pts) Write a function that takes as an argument a matrix with two columns and \(n\) rows, each row of which gives a pair of iata codes, one in each column, and returns the geodesic distance between each pair in a \(n\) by 1 matrix. Your function may also take the cargo data frame as an input. Call your function gcd.iata. For good performance, you should use indexing within this function, rather than a loop. You will lose credit if your code cannot finish running in time to submit your homework. Note: the formula used in your gcd function is subject to considerable roundoff error when the distance between the two points is very small. As a result, when the two locations match exactly, it is possible to obtain a distance that is not identically zero. In your function, check whether the two codes are the same, and if they are, always return zero for that entry. ```{r} ``` ## Part e (10 pts) Now compute the gcd between every pair of airports in cargo. Store the results in a matrix whose rows correspond to the origin airport and columns to the destination airport. Call the matrix \(D\) and give it row names and column names with the iata codes of the origin and destination airports. ```{r} ``` # Question 3 While in reality the operation of such a company would be much more complicated, we are going to assume for the purposes of this exercise that each plane makes exactly one trip a day, and that there are exactly as many shipments per day that need to be picked up and delivered as there are planes (30). ## Part a (10 pts) There \(m=50\) possible origins for each shipment, and, for each origin, 49 possible destinations, since the origin cannot equal the destination. Write a function to sample \(n\) origin-destination (od) pairs. Your od pairs should be sampled by taking a sample of size 2 without replacement from the iata codes. Your code should allow for unequal sampling probabilities. Call the probability vector \(\lambda\). Your code should take \(n,\lambda\) and iata_codes as arguments and return a matrix of size n by 2 whose entries are random origin-destination pairs specified as the origin iata code in column 1 and the destination iata code in column 2. ```{r} ``` ## Part b (10 pts) Now we will generate a random sample of origin-destination pairs and of the current locations of all of your planes. Set the random number seed to 14779. Then generate 30 locations for your airplanes (as iata codes) uniformly at random with replacement and generate 30 origin-destination pairs using equal probabilities for all airports. Print the first 10 of each. ```{r} ``` ## Part c (10 pts) Now, create a matrix that contains the distance between the current location of each plane and each origin airport for a shipment. The rows should correspond to current locations of planes, and the columns to origins for shipments ready to be picked up. Give rownames and column names to the matrix so that you can identify which entry corresponds to which city pair. ```{r} ``` # Question 4 In this question we'll determine the optimal way in which to dispatch the planes to minimize the total amount of time spent picking up and delivering each shipment. ## Part a (10 pts) Focusing now on just the first plane, identify which shipment ready for pickup is closest to that plane, and compute how long the plane will take to get there. Print the iata code of the current location of the plane, the origin city for the shipment, and how long it will take to get there. ```{r} ``` ## Part b (10 pts) More generally you need to assign planes to all shipments in an optimal fashion. In Operations Research this is called the "Assignment Problem". There is a package for R called `lpSolve` which solves the assignment problem. Install the lpSolve package. Once installed use `library(lpSolve)` to make the package available to R. If your distance matrix in Question 3 is called distmat, you assign planes to shipments in an optimal fashion with the command `fm <- lp.assign(distmat)`. now `fm$solution` gives you a matrix that shows which plane was assigned to each shipment. in addition, `fm$objval` gives you the total distance traveled between current airports and origin airports for this optimal allocation. solve the problem for your current locations and locations with shipments ready for pickup, then print the city that each of the 30 planes must fly to in order to pick up their cargo. also print the total distance traveled in order to pick up the shipment. ```{r} ``` # question 5 in this question you will put together what you've already done to compute some features of your operation like the average time it takes between the plane leaving its current airport and the shipment being delivered. ## part a (5 pts) compute the distance between your origin and destination pairs and store it as od.dists. ```{r} ``` ## part b (10 pts) for the optimal assigment you computed in the previous question, take one sample of the total amount of time each plane takes to deliver the shipment to its destination. this includes the time required to get from the current airport to the origin airport where the shipment is waiting, the time to load the shipment, the time to get to the destination airport, and the time to unload the shipment. assume that loading times (in hours) have independent \(\text{exponential}(1)\) distributions and that unloading times (in hours) have independent \(\text{exponential}(2)\) distributions. before running your code, set the random number seed to 10733. then show the average of the times you sampled in hours. ```{r} ``` ## part c (20 pts) write a function that takes a number of monte carlo samples (nmc), the number of planes/shipments per day (n), the cargo dataframe, and a probability vector for airports (lambda), and returns lp.assign(distmat)`.="" now="" `fm$solution`="" gives="" you="" a="" matrix="" that="" shows="" which="" plane="" was="" assigned="" to="" each="" shipment.="" in="" addition,="" `fm$objval`="" gives="" you="" the="" total="" distance="" traveled="" between="" current="" airports="" and="" origin="" airports="" for="" this="" optimal="" allocation.="" solve="" the="" problem="" for="" your="" current="" locations="" and="" locations="" with="" shipments="" ready="" for="" pickup,="" then="" print="" the="" city="" that="" each="" of="" the="" 30="" planes="" must="" fly="" to="" in="" order="" to="" pick="" up="" their="" cargo.="" also="" print="" the="" total="" distance="" traveled="" in="" order="" to="" pick="" up="" the="" shipment.="" ```{r}="" ```="" #="" question="" 5="" in="" this="" question="" you="" will="" put="" together="" what="" you've="" already="" done="" to="" compute="" some="" features="" of="" your="" operation="" like="" the="" average="" time="" it="" takes="" between="" the="" plane="" leaving="" its="" current="" airport="" and="" the="" shipment="" being="" delivered.="" ##="" part="" a="" (5="" pts)="" compute="" the="" distance="" between="" your="" origin="" and="" destination="" pairs="" and="" store="" it="" as="" od.dists.="" ```{r}="" ```="" ##="" part="" b="" (10="" pts)="" for="" the="" optimal="" assigment="" you="" computed="" in="" the="" previous="" question,="" take="" one="" sample="" of="" the="" total="" amount="" of="" time="" each="" plane="" takes="" to="" deliver="" the="" shipment="" to="" its="" destination.="" this="" includes="" the="" time="" required="" to="" get="" from="" the="" current="" airport="" to="" the="" origin="" airport="" where="" the="" shipment="" is="" waiting,="" the="" time="" to="" load="" the="" shipment,="" the="" time="" to="" get="" to="" the="" destination="" airport,="" and="" the="" time="" to="" unload="" the="" shipment.="" assume="" that="" loading="" times="" (in="" hours)="" have="" independent="" \(\text{exponential}(1)\)="" distributions="" and="" that="" unloading="" times="" (in="" hours)="" have="" independent="" \(\text{exponential}(2)\)="" distributions.="" before="" running="" your="" code,="" set="" the="" random="" number="" seed="" to="" 10733.="" then="" show="" the="" average="" of="" the="" times="" you="" sampled="" in="" hours.="" ```{r}="" ```="" ##="" part="" c="" (20="" pts)="" write="" a="" function="" that="" takes="" a="" number="" of="" monte="" carlo="" samples="" (nmc),="" the="" number="" of="" planes/shipments="" per="" day="" (n),="" the="" cargo="" dataframe,="" and="" a="" probability="" vector="" for="" airports="" (lambda),="" and="">- lp.assign(distmat)`. now `fm$solution` gives you a matrix that shows which plane was assigned to each shipment. in addition, `fm$objval` gives you the total distance traveled between current airports and origin airports for this optimal allocation. solve the problem for your current locations and locations with shipments ready for pickup, then print the city that each of the 30 planes must fly to in order to pick up their cargo. also print the total distance traveled in order to pick up the shipment. ```{r} ``` # question 5 in this question you will put together what you've already done to compute some features of your operation like the average time it takes between the plane leaving its current airport and the shipment being delivered. ## part a (5 pts) compute the distance between your origin and destination pairs and store it as od.dists. ```{r} ``` ## part b (10 pts) for the optimal assigment you computed in the previous question, take one sample of the total amount of time each plane takes to deliver the shipment to its destination. this includes the time required to get from the current airport to the origin airport where the shipment is waiting, the time to load the shipment, the time to get to the destination airport, and the time to unload the shipment. assume that loading times (in hours) have independent \(\text{exponential}(1)\) distributions and that unloading times (in hours) have independent \(\text{exponential}(2)\) distributions. before running your code, set the random number seed to 10733. then show the average of the times you sampled in hours. ```{r} ``` ## part c (20 pts) write a function that takes a number of monte carlo samples (nmc), the number of planes/shipments per day (n), the cargo dataframe, and a probability vector for airports (lambda), and returns>