What to submit 1. An MS Word or PDF file containing your answers to all the assignment questions. 2. An R Notebook file Assignment2_submission.Rmd containing all your code. The file should be able to...

1 answer below »
Have attached the file, dataset and solution from previous assignment if required


What to submit 1. An MS Word or PDF file containing your answers to all the assignment questions. 2. An R Notebook file Assignment2_submission.Rmd containing all your code. The file should be able to run and knit. Include sufficient comments so that the script can be understood by your marker. Indicate all the packages that need to be installed separately. Application scenario You work in a data science team that tries to model the road accidents in an area to improve the prediction for rescue services demand. For your convenience, you are provided with the following data links, but you are encouraged to include other relevant data for your analyses. 1. The road accidents data set, attached to the assignment: Attached file 2. The daily temperature and precipitation data for the region accessible through the NOAA data APIs. https://www.ncdc.noaa.gov/cdo-web/webservices/v2 Of particular relevance is the “Global Historical Climatology Network - Daily” data: https://www.ncdc.noaa.gov/ghcn-daily-description https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt Task 1: Source weather data (10 points) From Assignment 1, you have processed data for the road accidents of different types in a given region of Victoria. We still need to find local weather data from the same period. You are encouraged to find weather data online. Besides the NOAA data, you may also use data from the Bureau of Meteorology historical weather observations and statistics. (The NOAA Climate Data might be easier to process, also a full list of weather stations is provided here: https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt ) Answer the following questions: 1. Which data source do you plan to use? Justify your decision. (4 points) 2. From the data source identified, download daily temperature and precipitation data for the region during the relevant time period. (Hint: If you download data from NOAA https://www.ncdc.noaa.gov/cdo-web/, you need to request an NOAA web service token for accessing the data.) (2 points) 3. Answer the following questions: · How many rows are in the data? (2 points) · What time period does the data cover? (2 points) Task 2: Model planning (10 points) Careful planning is essential for a successful modelling effort. Please answer the following planning questions. 1. Model planning: · · How will the final model be used? (1 point) · How will it be relevant to the emergency services demand? (1 point) · Who are the potential users of your model? (1 point) 2. Relationship and data: · · What relationship do you plan to model or what do you want to predict? (1 pont) · What is the response variable? (1 point) · What are the predictor variables? (1 point) · Will the variables in your model be routinely collected and made available soon enough for prediction? (1 point) · As you are likely to build your model on historical data, will the data in the future have similar characteristics? (1 point) 3. What statistical method(s) will be applied to generate the model? Why? (2 points) Task 3: Model the number of road traffic accidents (30 points) We will start with simple models and gradually make them more complex and improve them. We will focus on the road traffic accident variable(s) that you defined in Assignment 1. Let’s denote it Y. Randomly pick a region from the road traffic accidents data. 1. Which region do you pick? (1 point) 2. Fit a linear model for Y using date as the predictor variable. Plot the fitted values and the residuals. Assess the model fit. Is a linear function sufficient for modelling the trend of Y? Support your conclusion with plots. (4 points) 3. As we are not interested in the trend itself, relax the linearity assumption by fitting a generalised additive model (GAM). Assess the model fit. Do you see patterns in the residuals indicating insufficient model fit? (5 points) 4. Augment the model to incorporate the weekly variations. (5 points) 5. Compare the models using the Akaike information criterion (AIC). Report the best-fitted model through coefficient estimates and/or plots. (5 points) 6. Analyse the residuals. Do you see any remaining correlation patterns among the residuals? (4 points) 7. What data type is your day-of-the-week variable? (3 points) Does the data type of this variable affect the model fit? (3 points) Task 4 Heatwaves, precipitation and road traffic accidents (30 points) The connection between weather and the road traffic accidents is widely reported. In this task, you will try to measure the heatwave and assess its impact on the road accident statistics. Task 4.1: Measuring heatwave (8 points) 1. John Nairn and Robert Fawcett from the Australian Bureau of Meteorology have proposed a measure for the heatwave, called the excess heat factor (EHF). Read the following article to understand the definition of the EHF. (3 points) https://dx.doi.org/10.3390%2Fijerph120100227 2. Use the NOAA data to calculate the daily EHF values for the area you chose during the relevant time period. Plot the daily EHF values. (5 points) Task 4.2: Models with EHF (7 points) Use the EHF as an additional predictor to augment the model(s) that you fitted before. Report the estimated effect of the EHF on the road accident numbers. (3 points) Does the extra predictor improve the model fit? (1 point) What conclusions can you draw? (3 points) Task 4.3: Research question - extra weather features (15 points) Is EHF a good predictor for road traffic accidents? Can you think of extra weather features that may be more predictive of road traffic accident numbers? (5 points) Try incorporating your feature into the model and see if it improves the model fit. Use AIC to prove your point.  (10 points) Task 5: Reflection (20 points) In the form of a short report (500-1000 words, 1-2 pages), answer the following questions: 1. We used some historical data to fit regression models. What are the limitations of such data, if any? (5 points) 2. Regression models can be used for 1) understanding a process, or 2) making predictions. In this assignment, do we have reasons to choose one objective over the other? (5 points) How would the decision affect our models? (5 points) 3. Overall, have your analyses answered the questions that you set out to answer? (5 points)
Answered 8 days AfterSep 13, 2022

Answer To: What to submit 1. An MS Word or PDF file containing your answers to all the assignment questions. 2....

Radhika answered on Sep 21 2022
59 Votes
Task 1: Source weather data (10 points)
From Assignment 1, you have processed data for the road accidents of different types in a given region of Victoria. We still need to find local weather data from the same period. You are encouraged to find weather data online. Besides the NOAA data, you may also use data from the Bureau of Meteorology historical weather observations and statistics. (The NOAA Climate Data might be easier to process, also a full list of weather stations is provided here: https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt )
Answer the
following questions:
1. Which data source do you plan to use? Justify your decision. (4 points)
Solution : I have downloaded the dataset from using the link : https://www.ncei.noaa.gov/access/search/data-search/daily-summaries.
The above link directs to the website of national center of environmental information. Under the category of global historical climatology network daily (GHCN), version 3 we have chosen Caribou weather forecast office. The reason for choosing such a data as it has maximum number of data types (70 in number). It covers all the basic information related to :
· Evaporation of water from evaporation pan
· Maximum soil temperature with unknown cover at 10 cm depth
· Base of frozen ground layer
· Thunder
· Ice pellets, sleet, snow pellets, or small hail
· Multiday wind movement
· Hail (may include small hail)
· Thunder
· Glaze or rime
· Number of days included in the multiday wind movement (MDWM)
· Dust, volcanic ash, blowing dust, blowing sand, or blowing obstruction
· Smoke or haze
· Blowing or drifting snow
· Direction of fastest 1-minute and 2-minute wind (degrees)
· Minimum soil temperature with sod cover at 5 cm depth
· Tornado, waterspout, or funnel cloud
· Direction of fastest 5-second wind (degrees) and High or damaging winds
· Peak gust time (hours and minutes, i.e., HHMM)
· Blowing spray
· AWBT
· Mist
· Top of frozen ground layer
· 24-hour wind movement
· Minimum soil temperature with sod cover at 10 cm depth
· Time of fastest mile or fastest 1-minute/ 2-minute/5-second wind speed
· Snowfall (mm)
· ASLP
· Thickness of ice on water
· Temperature at the time of observation (tenths of degrees C)
· Multiday evaporation total (use with DAEV)
· ASTP
· Peak guest wind speed
· Fog, ice fog, or freezing fog (may include heavy fog)
· Water equivalent of snow on the ground
· Heavy fog or heaving freezing fog (not always distinguished from fog)
· Minimum soil temperature with unknown cover at 10 cm depth
· Difference between river and gauge height
· RHMX
· Precipitation
· Daily maximum temperature of water in an evaporation pan (tenths of degrees C)
· RHAV
· Snow depth (mm)
· Maximum temperature (tenths of degrees C)
· Thickness of frozen ground layer
· Maximum soil temperature with sod cover at 10 cm depth
· Direction of peak wind gust (degrees)
· Drizzle / freezing drizzle
· Average cloudiness sunrise to sunset from manual observations
· Rain (may include freezing rain, drizzle, and freezing drizzle)
· Freezing rain
· Snow, snow pellets, snow grains, or ice crystals
· Unknown source of precipitation
· Number of days included in the multiday evaporation total (MDEV)
· RHMN
· Average daily wind speed
· Ground fog
· Rain or snow shower
· Ice fog or freezing fog
· Minimum temperature (tenths of degrees C) and Average temperature
· ADPT
· Daily percent of possible sunshine with Daily minimum temperature of water in an evaporation pan (tenths of degrees C) and Daily total sunshine (minutes)
Thus the coverage of all possible climate conditions measured by means of different parameter stated above has been the reason for inclusion of such dataset. It covers the maximum, minimum or range of all the essential climate tools of the area.
2. From the data source identified, download daily temperature and precipitation data for the region during the relevant time period. (Hint: If you download data from NOAA https://www.ncdc.noaa.gov/cdo-web/, you need to request an NOAA web service token for accessing the data.) (2 points)
Solution : The daily temperature and precipitation data has been downloaded from the NOAA website. I have requested the data from this website by using the token :
    Token:
    UvTltFQmQCNlSGgbXGCphsQUerIbvBZE
Code in R : First converted the csv file into excel file and then imported in R with the help of following commands:
library(readxl)
USW00014607 <- read_excel("C:/Users/user/Desktop/Grey Nodes/USW00014607.xlsx")
View(USW00014607)
3.    Answer the following questions:
    How many rows are in the data? (2 points)
    What time period does the data cover? (2 points)
Solution: a. The number of rows in our dataset is 30505
By using the rows function in R we get the output as :
temp<-USW00014607
> nrow(temp)
30505
b. It covers the time period (in years) from 1939 – 1966.
Task 2: Model planning (10 points)
Careful planning is essential for a successful modelling effort. Please answer the following planning questions.
1. Model planning:
· How will the final model be used? (1 point)
· How will it be relevant to the emergency services demand? (1 point)
· Who are the potential users of your model? (1 point)
Solution : The final model can be used to study the regulatory climatic conditions in our selected regions. The daily rainfall, precipitation rate, wind speed can help us to detect the monsoon predictions as a function of time series data (data is available for a period of cycle more than 10 years). Secondly in term of snowfall we can predict the months for which there are heavy snowfall in comparison to light snow. This can help us to allocate the resources of household needs and medical...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here