Python Assignment, analysing dataCertainly no one who spent the last summer in Canberra can forget...

Question

Python Assignment, analysing dataCertainly no one who spent the last summer in Canberra can forget the days when the city was  blanketed in smoke from the surrounding bushfires, despite all that has happened since. During this  time, we learned about the Air Quality Index (AQI), and in particular the PM2.5 index, which  measures the quantity of small dust particles in the air. The ACT government provides on-line AQI data from three monitoring stations in the ACT, located in  the suburbs Civic, Florey (to the west) and Monash (to the south). AQI readings for the last 24 hours  can be viewed on the web site, and historical data, going as far back as 2013, can be downloaded.  There are also other web sites that provide live or historical AQI data from other parts of Australia  and around the world. In this assignment, you will write a python program to analyse historical data from the ACT  monitoring stations, to answer some questions about air quality in Canberra.  Data and files provided  There are many different ways of calculating an index of air quality, used in different  countries around the world (see, for example, the wikipedia page “Air quality index”). The  Australian system considers seven different pollutants: carbon monoxide, nitrogen dioxide,  ozone, sulphur dioxide, lead, and two sizes of dust particles, PM10 and PM2.5. The  measured concentration of each pollutant is linearly scaled into an index, based on what is  considered an acceptable standard, so that the indices for all pollutants use the same  scale. An index value of 0 means the air is clear of the pollutant, while an index value of  100 means the standard level has been reached; therefore, an index value of 100 or above  is considered “poor” air quality, and an index value of 200 (twice the acceptable standard)  or above is considered “hazardous”. The reported index is based on the average scaled  measurement over an interval of time, which is different for different pollutants. For dust  particles (PM10 and PM2.5) it is over the last 24 hours.  The data files provided by ACT Health do not record all seven pollutants. The data files are  in comma-separated value (CSV) format. To simplify the assignment, we have split them  into one file for each monitoring station:  • aqi_data_civic.csv  • aqi_data_florey.csv  • aqi_data_florey.csv  Each file follows the same format. The first line of the file is a header, which gives the  names of the columns. Each following line is a data entry, and contains the index values  recorded at a particular date and time. The columns are:  • Name: The name of the monitoring station. This will be the same for all entries in  each file, since we have split the data up by station.  • GPS: The location of the monitoring station. (The coordinates appear to be  occasionally wrong, but this does not affect the assignment since we will not be  using them.)  • DateTime: The date and time of the entry, in the format DD/MM/YYYY  HH:MM:SS AM/PM. All entries should be on whole hours, meaning the  minutes and seconds are zero.  https://en.wikipedia.org/wiki/Air_quality_index • NO2: The nitrogen dioxide measurement.  • O3_1hr: The ozone measurement (last hour).  • O3_4hr: 4-hour rolling average of the ozone measurement.  • CO: The carbon monoxide measurement (8-hour rolling average).  • PM10: The PM10 particles measurement (24-hour rolling average).  • PM2.5: The PM2.5 particles measurement (24-hour rolling average).  • AQI_CO, AQI_NO2, AQI_O3_1hr, AQI_O3_4hr, AQI_PM10  and AQI_PM2.5: The air quality index values corresponding to the  measurements.  • AQI_Site: The combined air quality index for the site at the time. This should  equal the highest of the index values for the measured pollutants.  • Date: The date of the entry, in the format day month year (the name  of the month, rather than the number).  • Time: The time of the entry, in 24-hour format (i.e., HH:MM).  Two important facts to note:  • The data is not complete. There are dates/hours for which there is no entry, and  even when there is an entry, some of the pollutant measurements or corresponding  index values may be missing. Missing values are indicated by empty fields in the  CSV file.  • Entries in the CSV file are not ordered by date and time. In fact, they appear in no  particular order.  Questions for analysis (code)  In this assignment, we will consider only the air quality index value for PM2.5 particles  (column AQI_PM2.5).  A template file for the assignment code is provided here:  • Assignment.py  In this file, there is only one function that you must  implement: analyse(path_to_file). The function takes a single argument, which  is the complete path (file name and optionally a directory) to the data file that it should read  and analyse. You can assume that the path will be a string. The function should print out  the results of the analysis. It does not have to return any value. The specific questions that  your analysis should answer are described below. The following are some general  requirements and things to keep in mind:  • You do not have to solve all the questions, but you can only gain marks for the  ones that you have attempted (see marking criteria below for details on how we will  mark your submission).  • Although we do not specify the exact format in which you should print the results of  the analysis, you should make it easy for the user (and marker) to see what is  being shown. Ease of reading the output of your program is part of the marking  criteria.  • Although there is only one function in the assignment template that you must  implement, you can define other functions and use them in your solution.  Indeed, good code organisation, including appropriate use of functional  decomposition, is part of the marking criteria.  Question 1  An air quality index of 100 or above is considered poor. For each year that is present in the  data file, count the number of days that have at least one entry with an AQI PM2.5 of 100  or above. Print the results with one line per year, for example like this:  ```  analysing data from file aqi_data_civic.csv Question 1:  2014 had 0 days with an AQI PM2.5 of 100 or  above  2015 had 0 days with an AQI PM2.5 of 100 or  above  2016 had 1 days with an AQI PM2.5 of 100 or  above  2017 had 2 days with an AQI PM2.5 of 100 or  above  2018 had 2 days with an AQI PM2.5 of 100 or  above  2019 had 36 days with an AQI PM2.5 of 100  or above  2020 had 25 days with an AQI PM2.5 of 100  or above  ```  Question 2  In 2020, so far, January has been by far the worst month for air quality, because of the  bushfires. But is that normally the case? We want to find out which month of the year is  most frequently the worst.  To answer this question, we first have to define what it means for a month to be “worst”.  We will use the same approach as in Question 1, and count the number of days in each  month that have at least one reading equal to or above a threshold value; the month that  has the highest number of such days is the one we consider the worst. For this question,  we will use the threshold value 33, because an index below 33 is considered “good” air  quality.  For each year that is present in the data file, determine which month has the most days  that have at least one reading of 33 or above; then for each month, determine how many  years it was the worst. Print the results for each month, excluding those that are never the  worst. For example, like this:  ```  analysing data from file aqi_data_civic.csv Question 1:  ... Question 2:  December was the worst month in 2 years  (2018, 2019)  January was the worst month in 1 years  (2020)  April was the worst month in 1 years (2016)  May was the worst month in 1 years (2017)  June was the worst month in 1 years (2017)  July was the worst month in 1 years (2015)  ```  Note that you will have to decide how to handle ties (if two or more months in one year had  the same number of days with a reading of 33 or more). However, a month that has zero  days with a reading of 33 or higher should not be considered “worst”. Remember that you  should document your handling of ties (and any other interpretations or design decisions  that you make) in your code, using comments and/or docstrings as appropriate.   Question 3  The highest AQI PM2.5 value in the ACT data files, 5185, was recorded at the Monash  monitoring station on 2020-01-01 at 8pm. However, this is actually the average value over  the last 24 hours (i.e., from 8pm the previous day). We want to calculate the actual value  at that point in time. The PM2.5 index is a rolling average, calculated over the last 24  hours. This means that if I[T] is the actual index value at time T, then the index value  recorded in the data file is  X[T] = (I[T] + I[T-1] + ... + I[T-23]) / 24  This implies that if the index changes from X[T] at time T to X[T+1] at time T+1,  then 24 * (X[T+1] - X[T]) = I[T+1] - I[T-23]. Also note that an AQI  value (point value or average) cannot be negative, since a value of zero means the  pollutant is absent from the air.  It may not be possible to calculate the maximum point value, because the record is not  complete and because in general averaging is not a reversible operation. However, even if  we cannot obtain an exact answer, we would like to obtain some approximation, for  example in the form of an interval in which we think the true highest reading lies.   Print the highest AQI value found in the

Certainly no one who spent the last summer in Canberra can forget the days when the city was blanketed in smoke from the surrounding bushfires, despite all that has happened since. During this time,...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment