Q1: function: read_poy_data (3 pts) Define a function read_poy_data with three parameters: file , start_year , and end_year that accomplishes the following, using pandas functions/methods: uses pandas...


Q1: function:read_poy_data(3 pts)


Define a functionread_poy_datawith three parameters:file,start_year, andend_yearthat accomplishes the following, usingpandasfunctions/methods:



  1. usespandasto read thefilein

  2. filters to only include the rows from thefilewhere the'POY'column in thefileDataFrame has the value 'Y' (Note: this extracts to only include the rows from time covers that were "Person of the Year")

  3. filters to inclde'Year'data from thestart_yeartoend_yearparameters (inclusive). By default,start_yearshould be the integers1923andend_yearshould be2021.


  4. returns the resulting DataFrame from the function.


Notes:



  • to test out your function here, you'll need toimport pandas as pdfirst (outside your function).

  • the column names should not be changed from what they are in the original file



Suggested smoke tests:



  • Executing the function:



    • read_poy_data(file = 'testdata.csv'): should return apandasDataFrame with 2 rows and 6 columns.


    • read_poy_data('testdata.csv', start_year = 2010)should return apandasDataFrame with 1 row and 6 columns





Q2: function:calculate_stats(3 pts)


Now, define a functioncalculate_statsthat takes in two parametersdf(the DataFrame it will operate on) andcol_name(the column name of the column that we want to calculate information about)


This function should:



  1. Calculatevalue_counts()on thecol_namecolumn of the dataframe, using thenormalize=Trueparameter in thevalue_counts()method

  2. Extract the top 5 results from step 1.


  3. returns the results from step 2 from the function



Suggested smoke tests: Executing the function as follows (wheredfis the output after having runread_poy_data()on 'testdata.csv'):calculate_stats(df, 'Occupation')should return:



Business 0.5 Science 0.5 Name: Occupation, dtype: float64

andcalculate_stats(df, 'Year')should return:



2017 0.5 1961 0.5 Name: Year, dtype: float64



Q3: moduletime_covers.py(5 pts)


Here, we'll move your functions from Part 1 into ourtime_covers.pymodule and get that module all ready to go!



time_covers.pyhas twoimportstatements at the top and a single functiongenerate_plot. The code ingenerate_plotfunctions; however, you'll notice that the code style is isn't great. You'll fix that in just a second!


To make this module more complete and polished, carry out the following steps:



  1. Copy theread_poy_dataandcalculate_statsfunctions from Q1 and Q2 (respectively) into thetime_coversmodule (the module will have three functions total, includinggenerate_plot)

  2. Edit all three functions for Code Style, as discussed in class

  3. Add helpful code comments throughout all three functions

  4. Addnumpystyle docs to all three functions, as discussed in class


Note: Nothing has to be done in the notebook for this question. Everything will happen intime_covers.py


Q4: test function:test_read_poy_data(3 pts)


Now it's your turn to write your own test function. Add a test functiontest_read_poy_data()totest_time_covers.pythat 1) includes at least threeassertstatements and 2) tests the functionality of theread_poy_datafunction.


Be sure to also include any necessaryimportstatements at the top of the test file for all of the tests in this file to execute.


Notes:






  • this will likely use the'testdata.csv'file provided

  • nothing has to be done in the notebook here; however, feel free to test out your work below


Q6:pytest(2 pts)


In the cell below, executepyteston your test file.


Q7: module import (1.1 pts)


At this point you have a module with three functions and a test file with two functions, on which you've (hopefully successfully) executedpytestand have passing tests.


Now, it's time to put it all together and use it!


Below,importyourtime_coversmodule, so that when you execute the five cells with code provided below, they execute without error, with the final three cells producing plots from your data.


Note: You will likely need to restart your kernel before theimportwill work.



Dec 06, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here