Please take a look at the attachment
Stat 405/705 Final Project Stat 405/705 Final Project James Johndrow Directions If in a question I refer to a function that we have not seen in class, then use the help facility in R or Google to find out about it. Insert your answers and the R code you used to generate them, beneath each question. If there is no R code chunk present, create one. Submit your solutions to canvas – both the R markdown file and the html file. I strongly suggest checking as you go that your file knits. If you cannot knit, submit just the Rmd, and include a comment that you could not knit. This final project is due Tuesday, March 17 at 11:59pm Eastern time. Late submissions will receive a zero. You may not collaborate with or discuss this assignment with classmates. You may post questions on Piazza publicly (i.e. so that other classmates can see the question), however you are to refrain from answering questions posed by other classmates. Only the teaching staff may respond to piazza questions about this assigment. We will be more hesitant to help you with these questions than on the homeworks, however we will of course answer clarifying questions. This assignment picks up where Homework 5 left off and extends the analysis. Note: you may want to read in some of your functions from Homework 5 in the R code chunk above, and also load any packages you used in that assignment. Chartered freight service is really booming. You have decided to expand your operation to serve the 75 largest cargo airports instead of just 50. The goal of these questions is to estimate the number of planes that you will need to meet a pre-specified level of service in terms of how long it takes to deliver a shipment. Obviously this is an important question since you will need to bring a lot more debt onto the books to finance the purchase of more planes. We will assume that there are at least as many planes as orders waiting to be picked up. A larger list of cargo airports is available in the data folder on Canvas. It is called cy17-cargo-airports.csv and contains 137 airports. Question 1: Part a (10 pts) You’ll need to re-assemble your merged dataset to get the coordinates together with the airports. Re-use your code from assignment 5 to merge the airport data with the location data (the latter still in the same file as before, the former in the new file with 137 airports). Note the new cargo volume data file cy17-cargo-airports only contains the year 2017 cargo tonnage, no data for other years. Don’t forget about removing the duplicate of AUS after the merge! Part b (10 pts) Now keep only the 75 airports with largest cargo tonnage, convert iata_code to character, and create numeric lat and lon variables in radians as before. Question 2 Write a function called “mc.sim”, that runs a Monte Carlo simulation, where the function takes the following four arguments (no default values required, but you can add them if you want). n — the number of planes available m — the number of shipments waiting for pick-up nmc – the number of monte carlo iterations cargo – the cargo data frame lambda – the weights for sampling current locations for planes and origin-destination pairs for shipments The function should return a one column matrix, with the number of rows equal to nmc. The elements in this matrix should be the average time that it takes to complete a shipment (where the average is taken over all shipments within an iteration). As in assignment 5, this is the time to complete a delivery, including time to get to the origin, time for loading/unloading, and time to fly to destination. The only difference here is that your function will return the average time over shipments for each iteration (a one column matrix), rather than a time for each shipment for each iteration (a \(m\) column matrix) Make sure you are randomly drawing shipment origins and destinations from the new list of 75. The problem you will face in this question is that in Assignment 5, the number of planes equalled the number of shipments, whereas now that is not necessarily true. If the number of planes equals the number of shipments then the distance matrix is square (the same number of rows as columns). The optimization function, “lp.assign” in fact requires a square matrix as input and will fail to converge otherwise. The trick here, when the number of planes does not equal the number of shipments, is to pad the distance matrix with zeroes (not NAs), so that it becomes square. If we have planes as rows and shipments as columns in the distance matrix, and there were 20 planes and 10 shipments, then you would need to add 10 columns of 20 rows each to make the distance matrix square. Because these extra columns (phantom shipments) are constructed with all zero entries they don’t change the solution to the optimization problem. Equivalently, you could create a distance matrix as square and full of zeroes, and then just compute the elements that correspond to actual shipments. Bottom line: you have to add in an extra step as compared to Assignment 5, where you ensure that the distance matrix is square if necessary, before calling “lp.assign.” Part a (25 pts) Write code for the Monte Carlo function described above. You will probably want to start with your code from Assignment 5, or with my code from the solutions to assignment 5. In addition to making the changes outlined above, also check whether the number of planes is greater than or equal to the number of shipments, and whether all of the arguments other than the cargo data frame are nonnegative and numeric, and throw an error if any of these conditions fail. You may find the function stopifnot useful. Part b (15 pts) Run the function with these arguments and report the estimated mean waiting time for each, in each case with \(\lambda\) being equal weights for each airport and doing 1000 simulation replicates (nmc=1000). Case 1 n=30,m=30 Case 2 n=40,m=30 Case 3 n=60,m=30 Case 4 n=15,m=20 Case 5 n=20,m=20 Question 2 We now assume that there are 30 shipments to pick up. In this question you will explore the average time it takes to complete a delivery, starting from when the plane leaves its current airport, as the number of planes varies between 30 and 80 in increments of 5, so there will be 11 different values of \(n\) to consider. Part a (5 pts) Create a matrix of dimension 11 by 1 containing the different values of \(n\) that you will consider, and call it N Part b (5 pts) Load the future library, call plan(multisession), and then load the future.apply library. If you have not used these before, you should install them using install.packages. You want to call install.packages in the console manually, not in your R markdown. Part c (10 pts) Using the future_apply function, perform 11 Monte Carlo simulations for the 11 different values of m in parallel. In each simulation, use nmc=1000 and set lambda to be equal for every airport. Store the result as mc.equal.weight. Print your Monte Carlo estimate of the average time to complete a delivery, starting from when the plane leaves its current location, for each value of \(n\) (11 numbers) Part d (10 pts) Repeat part c, but now making lambda proportional to the number of tons of cargo that passed through each airport in 2017. Call the result mc.diff.weight, and print the Monte Carlo estimate of the average time to complete a delivery, starting from when the plane leaves its current location, for each value of \(n\) (11 numbers) Part e (10 pts) Using only the results from part c (equal weights on all airports), make a plot of your Monte Carlo estimate of the average delivery time vs number of planes. Here you’re plotting an estimate of the average delivery time for a “typical” day. How large would your fleet of planes have to be to make the average delivery time on a typical day less than 4.6 hours? Part f (10 pts) Make a second plot of the estimated 90th percentile of the average delivery time vs number of planes (hint: use the quantile function). Whereas in part e, you’re comparing the average time (across planes) for a “typical” day to the number of planes in your fleet, here you’re comparing (roughly) the average delivery time on the worst of 10 days to the size of your fleet. How large would your fleet have to be to make the average delivery time on the worst out of 10 days be less than 5.25 hours? --- title: "Stat 405/705 Final Project" author: "James Johndrow" output: html_document: default pdf_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) if (!require('pacman')) {install.packages('pacman')} rm(list=ls(all=T)) ``` ## Directions If in a question I refer to a function that we have not seen in class, then use the help facility in R or Google to find out about it. Insert your answers and the R code you used to generate them, beneath each question. If there is no R code chunk present, create one. **Submit your solutions to canvas -- both the R markdown file and the html file. I strongly suggest checking as you go that your file knits. If you cannot knit, submit just the Rmd, and include a comment that you could not knit. This final project is due Tuesday, March 17 at 11:59pm Eastern time. Late submissions will receive a zero. You may not collaborate with or discuss this assignment with classmates. You may post questions on Piazza publicly (i.e. so that other classmates can see the question), however you are to refrain from answering questions posed by other classmates. Only the teaching staff may respond to piazza questions about this assigment. We will be more hesitant to help you with these questions than on the homeworks, however we will of course answer clarifying questions.** This assignment picks up where Homework 5 left off and extends the analysis. Note: you may want to read in some of your functions from Homework 5 in the R code chunk above, and also load any packages you used in that assignment. Chartered freight service is really booming. You have decided to expand your operation to serve the 75 largest cargo airports instead of just 50. The goal of these questions is to estimate the number of planes that you will need to meet a pre-specified level of service in terms of how long it takes to deliver a shipment. Obviously this is an important question since you will need to bring a lot more debt onto the books to finance the purchase of more planes. We will assume that there are at least as many planes as orders waiting