assignment is in files
In this project you will make calculations and conclusions based on real data collected by Statistics Canada about the prevalence of diabetes in Canada and in its four most populated provinces (Ontario, Quebec, British Columbia and Alberta). The data file contains the percentages of the population that have diabetes in each of the four provinces and also the country as a whole (excluding territories) for age groups 35 years of age and above. Data were collected between 2015 and 2021. You can find the actual data file by following this link: https://www.cs.ryerson.ca/~cps118/data/statscan_diabetes.csv The file format is CSV (comma spaced values). For this project, the relevant columns will be REF_DATE (year), GEO (country/province), Age group (35-49 years, 50-64 years, 65 years and over), Sex (Females/Males), and VALUE (the % of population that has diabetes). You will need to import all columns into MATLAB but only the relevant columns will be used for calculations. It is up to you to extract the data from the file and put it into MATLAB. This operation must be done with MATLAB! Do not copy and enter data by hand! Data not imported with MATLAB will not be considered. Note: Some data may be missing because they were not collected by Statistics Canada. When imported they are shown as NaN. You must ignore those missing data in your calculations (do not substitute 0 for the values - that would change the results!). You can see the techniques to deal mist missing data in MATLAB here: https://www.mathworks.com/help/matlab/data_analysis/missing-data-in-matlab.html Important notes about the report and its submission: I. All computations and plots are to be done with MATLAB only. II. You are to write a report. Your report must have an introduction about the purpose of the report and its presentation. III. The report must be detailed, well presented and attractive. Don't be afraid to use colours to emphasize parts of the report. Be creative in the use of tables, graphs and images. Points will be awarded to the exactness of the computations, appearance, ease of reading (use font sizes that are easy to read and use adequate line spacing and margins), and the quality of the English language. The report consists of the answers provided by each of the program requirements (the actual outputs from the program as cut/paste or screenshots) and two short conclusion paragraphs. The first one will explain why the results make sense from a scientific point of view and the second one will discuss the MATLAB operations that you used to answer the question. The length of the report should be approximately 13 pages (that include numerical results, plots, and the conclusion paragraphs; so about one page per question). The number of pages are just guidelines, you will not be penalized if your number of pages is different. Be original! Plagiarism will be dealt with severely to the full extent of Ryerson academic integrity regulations. The Turnitin system will be used to help the markers in their assessment of originality. IV. Your report must have a conclusion. You must report in the conclusion about your experience doing this project and how you would do things differently if you had to do this again. V. Your report must have a cover page that clearly shows your names and section numbers. VI. Your submission requires one of the following: Required elements: The entire project must be presented as one single script file. Divide your codes into sections, one for each question, and add text comments to identify which question is answered in that section. To create a section add a comment that begins with %%. To learn more about sections and how to create and use them, you can search for ‘Create and Run Sections in Code’ in the MATLAB help. All computations to be done in MATLAB using the imported data file. Nothing can be done by band. 1. Determine the following averages of the percentage of the population that are diagnosed with diabetes: a. Provincial averages (Ontario, Quebec, British Columbia, Alberta). One average per province (for all years and age groups). b. One national average for all years and age groups. c. Yearly averages (2015, 2016, 2017, 2018, 2019, 2020, 2021). One average per year (all age groups together) for each province and the whole country. d. The average percentage of diabetes among age groups (35-49, 60-64, 65+). One average per age group (all years) for each province and the whole country. 2. Determine which province has the highest percentage of diabetes (all years and age groups together as calculated in 1a) and which province has the lowest percentage. 3. Indicate the provinces that have diabetes percentages above the national average and the provinces that are below the national average. 4. Indicate which year and province has the highest percentage of diabetes. Do the same for the lowest percentage. In case of a tie, you can mention multiple years and provinces. 5. Make a graph (line plot) of the diabetes percentages for the years 2015 to 2021 (all age groups together). Make a single graph with the four provinces and the national averace (S lineq) ee different line