In this assignment, you will apply your learning to further analyse the XXXXXXXXXXemergency department (ED) demands at Perth and its connection with weather events. This activity builds on Assignment...

1 answer below »

In this assignment, you will apply your learning to further analyse the 2013-2014 emergency department (ED) demands at Perth and its connection with weather events. This activity builds on Assignment 1; you may want to review your assignment 1 solution and identify any reusable code. Please start early so that you can identify any skill/knowledge gap and seek support from the teaching staff and other students.


Application scenario


You work in a data science team that tries to model the ED demands in the Perth area to improve the demand prediction.
For your convenience, you are provided with the following data links, but you are encouraged to include other relevant data for your analyses.



  1. Theemergency departments admissions and attendancesdata set provided by the Department of Health of Western Australia:




http://data.gov.au/dataset/emergency-department-admissisons-and-attendances



  1. The daily temperature and precipitation data for the region accessible through the NOAA data APIs.




https://www.ncdc.noaa.gov/cdo-web/webservices/v2


Of particular relevance is the “Global Historical Climatology Network - Daily” data:




https://www.ncdc.noaa.gov/ghcn-daily-description
https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt


Task 1: Source weather data (5 points)


From Assignment 1, you have processed data for the ED demands. We still need to find local weather data from the same period. You are encouraged to find weather data online. Besides the NOAA data, you may also use data fromthe Bureau of Meteorology historical weather observations and statistics. (The NOAA Climate Data might be easier to process.)


Answer the following questions:




  1. Which data source do you plan to use? Justify your decision.




  2. From the data source identified, download daily temperature and precipitation data for the region during the relevant time period. (Hint: If you download data from NOAAhttps://www.ncdc.noaa.gov/cdo-web/, you need to request an NOAA web service token for accessing the data.)



  3. Answer the following questions:



  • How many rows are in the data?

  • What time period does the data cover?


Task 2: Model planning (5 points)


Careful planning is essential for a successful modelling effort. Please answer the following planning questions.




  1. How will the final model be used? How will it be relevant to the overcrowding problems at our EDs? (You may find some inspiration herehttp://bit.ly/2p5qLH6.) Who are the potential users of your model?




  2. What relationship do you plan to model or what do you want to predict? What is the response variable? What are the predictor variables? Will the variables in your model be routinely collected and made available soon enough for prediction?




  3. As you are likely to build your model on historical data, will the data in the future have similar characteristics?




  4. What statistical method(s) will be applied to generate the model? Why?




Task 3: Model the ED demands (10 points)


We will start with simple models and gradually improve them. We will focus on the ED demand variable(s) that you defined in Assignment 1. Let’s denote itY.


Randomly pick a hospital from the ED dataset.




  1. Which hospital do you pick?




  2. Fit a linear model forYusingdateas the predictor variable. Plot the fitted values and the residuals. Assess the model fit. Is a linear function sufficient for modelling the trend ofY? Support your conclusion with plots.




  3. As we are not interested in the trend itself, relax the linearity assumption by fitting a generalised additive model (GAM). Assess the model fit. Do you see patterns in the residuals indicating insufficient model fit?




  4. Augment the model to incorporate the weekly seasonality. Compare the models using the Akaike information criterion (AIC). Report the best-fitted model through coefficient estimates and/or plots.




  5. Analyse the residuals. Do you see any remaining correlation patterns among the residuals?




  6. Is your day-of-the-week variable numeric, ordinal, or categorical? Does the decision affect the model fit?




Task 4 Heatwaves and ED demands (15 points)


The connection between heatwaves and the ED demands is widely reported, as in this news article.




http://bit.ly/2kTE4cu


In this task, you will try to measure the heatwave and assess its impact on the ED demands.


Task 4.1: Measuring heatwave (6points)



  1. John Nairn and Robert Fawcett from the Australian Bureau of Meteorology have proposed a measure for the heatwave, called the excess heat factor (EHF). Read the following article to understand the definition of the EHF.




https://dx.doi.org/10.3390%2Fijerph120100227



  1. Use the NOAA data to calculate the daily EHF values for the Perth area during the relevant time period. Plot the daily EHF values.


Task 4.2: Models with EHF (5 points)


Use the EHF as an additional predictor to augment the model(s) that you fitted before. Report the estimated effect of the EHF on the ED demand. Does the extra predictor improve the model fit? What conclusions can you draw?


Task 4.3: Extra weather features(4 points)


Can you think of extra weather features that may be more predictive of ED demands? Try incorporating your feature into the model and see if it improves the model fit.


Task 5: Reflection (5 points)


Answer the following questions:



  1. We used some historical data to fit regression models. What are the limitations of such data, if any?

  2. Regression models can be used for 1) understanding a process, or 2) making predictions. In this assignment, do we have reasons to choose one objective over the other? How would the decision affect our models?

  3. Overall, have your analyses answered the questions that you set out to answer?


What to submit


By the due date, you are required to submit the following files to the assignment Dropbox in CloudDeakin.



  1. An MS Word or PDF file containing your answers to all the assignment questions.

  2. An R Notebook fileAssignment2_submission.Rmdcontaining all your code. The file should be able to run. Include sufficient comments so that the script can be understood by your marker. Indicate all the packages that need to be installed separately.


Marking criteria


Your submission will be marked using the following criteria.



  • Showing good effort through completed tasks.

  • Applying statistical thinking to understand the problems and to identify solutions.

  • Applying statistical programming skills to obtain data and to process them for data analysis.

  • Applying regression modelling techniques to discover and quantify relationships among variables.

  • Demonstrating creativity and resourcefulness in solutions.

  • Showing attention to details through a good quality assignment report.

  • Bonus mark may be awarded for completing optional tasks

Answered Same DaySep 06, 2021SIT741Deakin University

Answer To: In this assignment, you will apply your learning to further analyse the XXXXXXXXXXemergency...

Pritam answered on Sep 13 2021
151 Votes
Untitled
Untitled
Unknown
12 September 2019
Required libraries:
library(ggplot2)
library(flipTime)
require(devtools)
library(gam)
library(dplyr)
library(timetk)
Question 1:
The source that has been used here for the analysis is given below. https://www.ncdc.noaa.gov/cdo-web/ Basically this site provi
des free access of global historical and climate along with information of station history. Due to authenticity and the vastness of the data sets the site has been preferred solely.
Here in the weather data set one can find that there are 11048 rows and 17 variables. The time period that the data set covers is from the year of Januray, 1989 to April, 2019.
Question 2:
Using the historical data one can definitely build the model with attendance in the ED as the response variable and date, EHF as the predictor variables. The final model thus is built solely on the historical data and from which the patients can access the ED load on a particular day. Once they find the less load in the ED, they can make it to the hospital given the conditions and thus making the load of the hopital reduced by a significant amount. The potential users of the model can be the patient parties only to judge the perfect time and place to admit the patient.
The objective of the model is to predict the attendance of the patients admitted to emergency department on a particular day. Where the relationship that has been planned to be modeled is linear. Where the predictors, date and EHF (calculated separately from the historical data) will be taken as the linear combination to predict the response variable. The variables are recorded collectively and routinely. The EHF can be calculated instantly also from the historical weather data.
Since we are using linear regression modelling based on historical data, the might have the same characteristics based on the pattern if any.
Since date has been used as a predictor it is advised to use general additive model.
Question 3:
ED demands models:
The hospital chosen for the analysis Royal Perth Hospital.
gov = read.csv("gov.csv", header = T, skip = 1)
royal_data = gov[,1:8]
royal_data$Date = AsDate(as.character(royal_data$Date))
## Warning: Supplied date formats are ambiguous, two-digit year assumed to
## come after month.
attach(royal_data)
ggplot(royal_data,aes(x = Date,y = Attendance))+geom_path()
Linear model:
f1 = lm(Attendance~Date, data = royal_data)
summary(f1)
##
## Call:
## lm(formula = Attendance ~ Date, data = royal_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.840 -12.877 -0.125 11.357 48.967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -141.46031 136.74309 -1.034 0.30159
## Date 0.02293 0.00851 2.695 0.00737 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.13 on 363 degrees of freedom
## Multiple R-squared: 0.01961, Adjusted R-squared: 0.01691
## F-statistic: 7.262 on 1 and 363 DF, p-value: 0.007372
par(mfrow = c(2,2))
plot(f1)
One can...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here