Answer To: { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [...
Sudipta answered on Apr 16 2021
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#\n",
"# Import packages needed for processing\n",
"#\n",
"import numpy as np # import the numpy package\n",
"import csv # this package needed for processing csv file\n",
"from collections import Counter # this is for dictionary construction with counting functionality\n",
"import matplotlib.pyplot as plt # this is for plotting and other descriptive statistics\n",
"import datetime # this package is for handling time\n",
"#\n",
"# If you need add any additional packages, then add them below this line\n",
"#\n",
"\n",
"import pandas as pd\n",
"import timeit\n",
"import matplotlib as mpl\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: you must place your data file in the same folder with your python notebook."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Declare the path to the data file\n",
"DATA_FILE = \"owid-covid-data.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**. The following segment of codes is to load the data from the .csv file using csv package. It has been provided for you to get started. Please **do not** change this piece of code as well the variable names as we will need these variables to complete subsequent tasks. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dict_keys(['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases', 'new_cases_smoothed', 'total_deaths', 'new_deaths', 'new_deaths_smoothed', 'total_cases_per_million', 'new_cases_per_million', 'new_cases_smoothed_per_million', 'total_deaths_per_million', 'new_deaths_per_million', 'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions', 'weekly_icu_admissions_per_million', 'weekly_hosp_admissions', 'weekly_hosp_admissions_per_million', 'new_tests', 'total_tests', 'total_tests_per_thousand', 'new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'positive_rate', 'tests_per_case', 'tests_units', 'total_vaccinations', 'people_vaccinated', 'people_fully_vaccinated', 'new_vaccinations', 'new_vaccinations_smoothed', 'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred', 'people_fully_vaccinated_per_hundred', 'new_vaccinations_smoothed_per_million', 'stringency_index', 'population', 'population_density', 'median_age', 'aged_65_older', 'aged_70_older', 'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers', 'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand', 'life_expectancy', 'human_development_index'])\n"
]
}
],
"source": [
"# Load CSV file using DictReader\n",
"input_file = csv.DictReader(open(DATA_FILE))\n",
"fieldnames = input_file.fieldnames\n",
"data_dict = {fn: [] for fn in fieldnames}\n",
"print(data_dict.keys())\n",
"for line in input_file:\n",
" for k, v in line.items():\n",
" if (v == ''): #quick fix for missing values\n",
" v=0\n",
" try:\n",
" data_dict[k].append(int(v))\n",
" except ValueError:\n",
" try:\n",
" data_dict[k].append(float(v))\n",
" except ValueError:\n",
" data_dict[k].append(v)\n",
" \n",
"for k, v in data_dict.items():\n",
" data_dict[k] = np.array(v)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## **Instruction 1**\n",
"We first examine the variables and the overall data size. The tasks:\n",
"\n",
"1. Write your code to print the type for the following variables (**4 marks**):\n",
" * input_file\n",
" * data_dict\n",
" * data_dict['iso_code']\n",
" * data_dict['reproduction_rate']\n",
"\n",
"\n",
"2. Write your code to print out the number of data records (**1 mark**):\n",
"\n",
"[**Total mark: 5**]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n"
]
}
],
"source": [
"# INSERT YOUR CODE HERE\n",
"#\n",
"#1. Write you code to print the type for the following variables:\n",
"# * input_file\n",
"# * data_dict\n",
"# * data_dict['iso_code']\n",
"# * data_dict['reproduction_rate']\n",
"#\n",
"#The type function is used to print the type of the data\n",
"print(type(input_file))\n",
"print(type(data_dict))\n",
"print(type(data_dict['iso_code']))\n",
"print(type(data_dict['reproduction_rate']))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"79668\n"
]
}
],
"source": [
"# INSERT YOUR CODE HERE\n",
"#\n",
"# 2. Write your code to print out the number of data records\n",
"#\n",
"#open keyword is used to open the file\n",
"file = open(\"owid-covid-data.csv\")\n",
"#we find the length of the file\n",
"numline = len(file.readlines())\n",
"print (numline)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Instruction 2\n",
"The following code segment is designed to calculate some basic statistics from the data for a recent date. The data is stored in **data_filtered_by_date** dictionary.\n",
"\n",
"**Your task:**\n",
"1. Learn how the code, which extracts the data for all countries at a required date, works. Why **try-except** construction is required in this code? Provide the answer. (**2 marks**)\n",
"1. Write your code to find and print the minimum and the maximum values of mortality per million in the **total_deaths_per_million** column. (**1 mark**)\n",
"2. Write your code to find and print the minimum and the maximum values of cases per million in the **total_cases_per_million** column. (**1 mark**)\n",
"3. Write your code to find and print the mean and median mortality per million, and the standard deviation from the **total_deaths_per_million** column. (**2 marks**)\n",
"4. Write your code to construct a box plot for the **total_deaths_per_million**. (**2 marks**) \n",
"4. You will find that mean and median values for mortality are quite different. Briefly describe the meaning of the difference between the mean and median mortality per million. Why are they different? (**2 marks**)\n",
"\n",
"**[Total mark: 10]**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# THIS PART EXTRACTS THE DATA FOR ALL COUNTRIES AT A REQUIRED DATE.\n",
"# LEARN HOW IT WORKS.\n",
"# data_filtered_by_date DICTIONARY CONTAINS ALL DATA FOR ALL COUNTRIES AT required_date\n",
"\n",
"required_date = '2021-03-09'\n",
"\n",
"index_for_date = (data_dict['date'] == required_date)\n",
"data_filtered_by_date = {}\n",
"for key in data_dict.keys():\n",
" try:\n",
" data_filtered_by_date[key] = np.float_(data_dict[key][index_for_date])\n",
" except:\n",
" data_filtered_by_date[key] = data_dict[key][index_for_date]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 1. Why **try-except** construction is required in this code? (2 marks)\n",
"# INSERT YOUR ANSWER HERE\n",
"#"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Maximum: 2526.571\n",
"Minimum: 0.001\n"
]
}
],
"source": [
"# INSERT YOUR CODE HERE\n",
"#\n",
"# 2. Write your code to find and print the minimum and the maximum values of mortality \n",
"# per million (key name 'total_deaths_per_million'). (1 mark)\n",
"#\n",
"#we create a data frame from the csv file\n",
"df = pd.read_csv(\"owid-covid-data.csv\")\n",
"#we print the max and min value using max,min function\n",
"print(\"Maximum: \",df[\"total_deaths_per_million\"].max())\n",
"print(\"Minimum: \",df[\"total_deaths_per_million\"].min())\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Maximum: 159011.195\n",
"Minimum: 0.001\n"
]
}
],
"source": [
"# INSERT YOUR CODE HERE\n",
"#\n",
"# 3. Write your code to find and print the minimum and the maximum values of cases per million \n",
"# in the total_cases_per_million column. (1 mark)\n",
"#\n",
"#we print the max and min value using max,min function\n",
"print(\"Maximum: \",df[\"total_cases_per_million\"].max())\n",
"print(\"Minimum: \",df[\"total_cases_per_million\"].min())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean: 202.02780434368864\n",
"Median: 35.193000000000005\n",
"Standard Deviation: 356.63652697858066\n"
]
}
],
"source": [
"# INSERT YOUR CODE HERE\n",
"# Write your code to find and print the mean and median mortality per million, and the standard \n",
"# deviation from the total_deaths_per_million column. (2 marks)\n",
"#we find mean,median and SD on the data frame.\n",
"print(\"Mean: \",df[\"total_deaths_per_million\"].mean())\n",
"print(\"Median: \",df[\"total_deaths_per_million\"].median())\n",
"print(\"Standard Deviation: \",df[\"total_deaths_per_million\"].std())"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\dives\\anaconda3\\lib\\site-packages\\seaborn\\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n",
" warnings.warn(\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"