Q1: function:read_poy_data
(3 pts)
Define a functionread_poy_data
with three parameters:file
,start_year
, andend_year
that accomplishes the following, usingpandas
functions/methods:
- uses
pandas
to read thefile
in
- filters to only include the rows from the
file
where the'POY'
column in thefile
DataFrame has the value 'Y' (Note: this extracts to only include the rows from time covers that were "Person of the Year")
- filters to inclde
'Year'
data from thestart_year
toend_year
parameters (inclusive). By default,start_year
should be the integers1923
andend_year
should be2021
.
return
s the resulting DataFrame from the function.
Notes:
- to test out your function here, you'll need to
import pandas as pd
first (outside your function).
- the column names should not be changed from what they are in the original file
Suggested smoke tests:
- Executing the function:
read_poy_data(file = 'testdata.csv')
: should return apandas
DataFrame with 2 rows and 6 columns.
read_poy_data('testdata.csv', start_year = 2010)
should return apandas
DataFrame with 1 row and 6 columns
Q2: function:calculate_stats
(3 pts)
Now, define a functioncalculate_stats
that takes in two parametersdf
(the DataFrame it will operate on) andcol_name
(the column name of the column that we want to calculate information about)
This function should:
- Calculate
value_counts()
on thecol_name
column of the dataframe, using thenormalize=True
parameter in thevalue_counts()
method
- Extract the top 5 results from step 1.
return
s the results from step 2 from the function
Suggested smoke tests: Executing the function as follows (wheredf
is the output after having runread_poy_data()
on 'testdata.csv'):calculate_stats(df, 'Occupation')
should return:
Business 0.5 Science 0.5 Name: Occupation, dtype: float64
andcalculate_stats(df, 'Year')
should return:
2017 0.5 1961 0.5 Name: Year, dtype: float64
Q3: moduletime_covers.py
(5 pts)
Here, we'll move your functions from Part 1 into ourtime_covers.py
module and get that module all ready to go!
time_covers.py
has twoimport
statements at the top and a single functiongenerate_plot
. The code ingenerate_plot
functions; however, you'll notice that the code style is isn't great. You'll fix that in just a second!
To make this module more complete and polished, carry out the following steps:
- Copy the
read_poy_data
andcalculate_stats
functions from Q1 and Q2 (respectively) into thetime_covers
module (the module will have three functions total, includinggenerate_plot
)
- Edit all three functions for Code Style, as discussed in class
- Add helpful code comments throughout all three functions
- Add
numpy
style docs to all three functions, as discussed in class
Note: Nothing has to be done in the notebook for this question. Everything will happen intime_covers.py
Q4: test function:test_read_poy_data
(3 pts)
Now it's your turn to write your own test function. Add a test functiontest_read_poy_data()
totest_time_covers.py
that 1) includes at least threeassert
statements and 2) tests the functionality of theread_poy_data
function.
Be sure to also include any necessaryimport
statements at the top of the test file for all of the tests in this file to execute.
Notes:
- this will likely use the
'testdata.csv'
file provided
- nothing has to be done in the notebook here; however, feel free to test out your work below
Q6:pytest
(2 pts)
In the cell below, executepytest
on your test file.
Q7: module import (1.1 pts)
At this point you have a module with three functions and a test file with two functions, on which you've (hopefully successfully) executedpytest
and have passing tests.
Now, it's time to put it all together and use it!
Below,import
yourtime_covers
module, so that when you execute the five cells with code provided below, they execute without error, with the final three cells producing plots from your data.
Note: You will likely need to restart your kernel before theimport
will work.