It looks like a lot more than it is due to test code being included. It's only 6 problems.
Problem 0: Soccer Guru Version 1.5 Soccer season is on and teams need to start preparing for the World Cup 2022. We need your help as a Soccer Guru to analyse different statistics and come up with insights to help the teams prepare better. This problem tests your understanding of Pandas and SQL concepts. Important note. Due to a limitation in Vocareum's software stack, this notebook is set to use the Python 3.5 kernel (rather than a more up-to-date 3.6 or 3.7 kernel). If you are developing on your local machine and are using a different version of Python, you may need to adapt your solution before submitting to the autograder. Exercise 0 (0 points). Run the code cell below to load the data, which is a SQLite3 database containing results and fixtures of various soccer matches that have been played around the globe since 1980. Observe that the code loads all rows from the table, soccer_results, contained in the database file, prob0.db. You do not need to do anything for this problem other than run the next two code cells and familiarize yourself with the resulting dataframe, which is stored in the variable df. import sqlite3 as db import pandas as pd from datetime import datetime from collections import defaultdict disk_engine = db.connect('file:prob0.db?mode=ro', uri=True) def load_data(): df = pd.read_sql_query("SELECT * FROM soccer_results", disk_engine) return df # Test: Exercise 0 (exposed) df = load_data() assert df.shape[0] == 22851, "Row counts do not match. Try loading the data again" assert df.shape[1] == 9, "You don't have all the columns. Try loading the data again" print("\n(Passed!)") df.head() (Passed!) Out[2]: date home_team away_team home_score away_score tournament city country neutral 0 1994-01-02 Barbados Grenada 0 0 Friendly Bridgetown Barbados FALSE 1 1994-01-02 Ghana Egypt 2 1 Friendly Accra Ghana FALSE 2 1994-01-05 Mali Burkina Faso 1 1 Friendly Bamako Mali FALSE 3 1994-01-09 Mauritania Mali 1 3 Friendly Nouakchott Mauritania FALSE 4 1994-01-11 Thailand Nigeria 1 1 Friendly Bangkok Thailand FALSE Each row of this dataframe is a game, which is played between a "home team" (column home_team) and an "away team" (away_team). The number of goals scored by each team appears in the home_score and away_score columns, respectively. Exercise 1 (1 point): Write an SQL query find the ten (10) teams that have the highest average away-scores since the year 2000. Your query should satisfy the following criteria: · It should return two columns: · team: The name of the team · ave_goals: The team's average number of goals in "away" games. An "away game" is one in which the team's name appars in away_team and the game takes place at a "non-neutral site" (neutral value equals FALSE). · It should only include teams that have played at least 30 away matches. · It should round the average goals value (ave_goals) to three decimal places. · It should only return the top 10 teams in descending order by average away-goals. · It should only consider games played since 2000 (including the year 2000). Store your query string as the variable, query_top10_away, below. The test cell will run this query string against the input dataframe, df, defined above and return the result in a dataframe named offensive_teams. (See the test cell.) Note. The following exercises have hidden test cases and you'll be awarded full points for passing both the exposed and hidden test cases. query_top10_away = '' # Write your query here! ### query_top10_away = "SELECT away_team AS 'team', round(away_score, 3) FROM (SELECT soccer_results WHERE fff) LIMIT 10" ### print(query_top10_away) SELECT team, ave_goals # Test: Exercise 1 (exposed) offensive_teams = pd.read_sql_query(query_top10_away, disk_engine) df_cols = offensive_teams.columns.tolist() df_cols.sort() desired_cols = ['team', 'ave_goals'] desired_cols.sort() print(offensive_teams.head(10)) assert offensive_teams.shape[0] == 10, "Expected 10 rows but returned dataframe has {}".format(offensive_teams.shape[0]) assert offensive_teams.shape[1] == 2, "Expected 2 columns but returned dataframe has {}".format(offensive_teams.shape[1]) assert df_cols == desired_cols, "Column names should be: {}. Returned dataframe has: {}".format(desired_cols, df_cols) tolerance = .001 team_4 = offensive_teams.iloc[3].team team_4_ave = offensive_teams.iloc[3].ave_goals desired_team_4_ave = 1.763 assert (team_4 == "England" and abs(team_4_ave - 1.763) <= .001),="" "fourth="" entry="" is="" {}="" with="" average="" of="" {}.="" got="" {}="" with="" average="" of="" {}".format("england",="" 1.76,="" team_4,="" team_4_ave)="" ="" print("\n(passed!)")="" ="" print("""="" in="" addition="" to="" the="" tests="" above,="" this="" cell="" will="" include="" some="" hidden="" tests.="" you="" will="" only="" know="" the="" result="" when="" you="" submit="" your="" solution="" to="" the="" autograder.="" """)="" ="" ###="" ###="" autograder="" test="" -="" do="" not="" remove="" ###="" ="" exercise="" 2 (2="" points):="" suppose="" we="" are="" now="" interested="" in="" the="" top="" 10="" teams="" having="" the="" best="" goal differential,="" between="" the="" years="" 2012="" and="" 2018="" (both="" inclusive).="" a="" team's="" goal="" differential="" is="" the="" difference="" between="" the="" total="" number="" of="" goals="" it="" scored="" and="" the="" total="" number="" it="" conceded="" across="" all="" games="" (in="" the="" requested="" years).="" complete="" the="" function, best_goal_differential(),="" below,="" so="" that="" it="" returns="" a="" pandas="" dataframe="" containing="" the="" top="" 10="" teams="" by="" goal="" differential,="" sorted="" in="" descending="" order="" of="" differential.="" the="" dataframe="" should="" have="" two="" columns: team,="" which="" holds="" the="" team's="" name,="" and differential,="" which="" holds="" its="" overall="" goal="" differential.="" as="" a="" sanity="" check,="" you="" should="" find="" the="" brazil="" is="" the="" number="" one="" team,="" with="" a="" differential="" of="" 152="" during="" the="" selected="" time="" period="" of="" 2012-2018="" (inclusive).="" it="" should="" be="" the="" first="" row="" of="" the="" returned="" dataframe.="" def="" best_goal_differential():="" ###="" ###="" your="" code="" here="" ###="" ="" #="" test:="" exercise="" 2="" (exposed)="" ="" diff_df="best_goal_differential()" df_cols="diff_df.columns.tolist()" df_cols.sort()="" desired_cols="['team'," 'differential']="" desired_cols.sort()="" ="" assert="" isinstance(diff_df,="" pd.dataframe),="" "dataframe="" object="" not="" returned"="" assert="" diff_df.shape[0]="=" 10,="" "expected="" 10="" rows="" but="" returned="" dataframe="" has="" {}".format(diff_df.shape[0])="" assert="" diff_df.shape[1]="=" 2,="" "expected="" 2="" columns="" but="" returned="" dataframe="" has="" {}".format(diff_df.shape[1])="" assert="" df_cols="=" desired_cols,="" "column="" names="" should="" be:="" {}.="" returned="" dataframe="" has:="" {}".format(desired_cols,="" df_cols)="" ="" best_team="diff_df.iloc[0].team" best_diff="diff_df.iloc[0].differential" assert="" (best_team="=" "brazil"="" and="" best_diff="=" 152),="" "{}="" has="" best="" differential="" of="" {}.="" got="" team="" {}="" having="" best="" differential="" of="" {}".format("brazil",="" 152,="" best_team,="" best_diff)="" ="" print("\n(passed!)")="" ="" print("""="" in="" addition="" to="" the="" tests="" above,="" this="" cell="" will="" include="" some="" hidden="" tests.="" you="" will="" only="" know="" the="" result="" when="" you="" submit="" your="" solution="" to="" the="" autograder.="" """)="" ="" ###="" ###="" autograder="" test="" -="" do="" not="" remove="" ###="" ="" exercise="" 3 (1="" point).="" complete="" the="" function, determine_winners(game_df),="" below.="" it="" should="" determine="" the="" winner="" of="" each="" soccer="" game.="" in="" particular,="" the="" function="" should="" take="" in="" a="" dataframe="" like df from="" above.="" it="" should="" return="" a="" new="" dataframe="" consisting="" of="" all="" the="" columns="" from="" that="" dataframe="" plus="" a="" new="" columnn="" called winner,="" holding="" the="" name="" of="" the="" winning="" team.="" if="" there="" is="" no="" winner="" for="" a="" particular="" game="" (i.e.,="" the="" score="" is="" tied),="" then="" the winner column="" should="" containing="" the="" string, 'draw'.="" lastly,="" the="" rows="" of="" the="" output="" should="" be="" in="" the="" same="" order="" as="" the="" input="" dataframe.="" you="" can="" use="" any="" dataframe="" manipulation="" techniques="" you="" want="" for="" this="" question (i.e.,="" pandas="" methods="" or="" sql="" queries,="" as="" you="" prefer).="" you'll="" need="" the="" output="" dataframe="" from="" this="" exercise="" for="" the="" subsequent="" exercies,="" so="" don't="" skip="" this="" one!="" def="" determine_winners(game_df):="" ###="" ###="" your="" code="" here="" ###="" ="" #="" test:="" exercise="" 3="" (exposed)="" ="" game_df="pd.read_sql_query("SELECT" *="" from="" soccer_results",="" disk_engine)="" winners_df="determine_winners(game_df)" ="" game_winner="winners_df.iloc[1].winner" assert="" game_winner="=" "ghana",="" "expected="" ghana="" to="" be="" winner.="" got="" {}".format(game_winner)="" ="" game_winner="winners_df.iloc[2].winner" assert="" game_winner="=" "draw",="" "match="" was="" draw.="" got="" {}".format(game_winner)="" ="" game_winner="winners_df.iloc[3].winner" assert="" game_winner="=" "mali",="" "expected="" mali="" to="" be="" winner.="" got="" {}".format(game_winner)="" ="" print("\n(passed!)")="" in [ ]:="" ="" print("""="" in="" addition="" to="" the="" tests="" above,="" this="" cell="" will="" include="" some="" hidden="" tests.="" you="" will="" only="" know="" the="" result="" when="" you="" submit="" your="" solution="" to="" the="" autograder.="" """)="" ="" ###="" ###="" autograder="" test="" -="" do="" not="" remove="" ###="" ="" exercise="" 4 (3="" points):="" given="" a="" team,="" its home="" advantage="" ratio is="" the="" number="" of="" home="" games="" it="" has="" won="" divided="" by="" the="" number="" of="" home="" games="" it="" has="" played.="" for="" this="" exercise,="" we'll="" try="" to="" answer="" the="" question,="" how="" important="" is="" the="" home="" advantage="" in="" soccer?="" it's="" importance="" is="" factored="" into="" draws="" for="" competitions,="" for="" example,="" teams="" wanting="" to="" play="" at="" home="" the="" second="" leg="" of="" the="" matches="" of="" great="" importance="" such="" as="" tournament="" knockouts.="" (this="" exercise="" has="" a="" pre-requisite="" of="" finishing="" exercise="" 3="" as="" we'll="" be="" using="" the="" results="" of="" the="" dataframe="" from="" that="" exercise="" in="" this="" one.)="" complete="" the="" function, calc_home_advantage(winners_df),="" below,="" so="" that="" it="" returns="" the="" top="" 5="" countries,="" among="" those="" that="" have="" played="" at="" least="" 50 home games,="" having="" the="" highest="" home="" advantage="" ratio.="" it="" should="" return="" a="" dataframe="" with="" two="" columns, team and ratio,="" holding="" the="" name="" of="" the="" team="" and="" its="" home="" advantage="" ratio,="" respectively.="" the="" ratio="" should="" be="" rounded="" to="" three="" decimal="" places.="" the="" rows="" should="" be="" sorted="" in="" descending="" order="" of="" ratio.="" if="" there="" are="" two="" teams="" with="" the="" same="" winning="" ratio,="" the="" teams="" should="" appear="" in="" alphabetical="" order="" by="" name.="" note="" 0. as="" with="" our="" definition="" of="" away-games,="" a="" team="" plays="" a="" home="" game="" if="" it="" is="" the="" home="" team="" (home_team) and the="" field="" is="" non-neutral="" (i.e., neutral is false).="" note="" 1. you="" should="" find,="" for="" example,="" that="" brazil="" is="" the="" number="" two="" team,="" with="" a="" home="" advantage="" ratio="" of="" 0.773.="" def="" calc_home_advantage(winners_df):="" ###="" ###="" your="" code="" here="" ###="" ="" #="" test:="" exercise="" 4="" (exposed)="" from="" ipython.display="" import="" display="" ="" win_perc="calc_home_advantage(winners_df)" ="" print("the="" solution,="" according="" to="" you:")="" display(win_perc)="" ="" df_cols="win_perc.columns.tolist()" df_cols.sort()="" desired_cols="['team'," 'ratio']="" desired_cols.sort()="" ="" assert="" win_perc.shape[0]="=" 5,="" "expected="" 5="" rows,="" got="" {}".format(win_perc.shape[0])="" assert="" win_perc.shape[1]="=" 2,="" "expected="" 2="" columns,="" got="" {}".format(win_perc.shape[1])="" assert="" df_cols="=" desired_cols,="" "expected="" {}="" columns="" but="" got="" {}="" columns".format(desired_cols,="" df_cols)="" ="" tolerance=".001" sec_team="win_perc.iloc[1].team" sec_perc="win_perc.iloc[1].ratio" ="" assert="" (sec_team="=" "brazil"="" and="" abs(sec_perc="" -="" .773)="">=><= tolerance), "second team should be {} with ratio of {}. \ got {} with ratio of {}".format("brazil", .773, sec_team, sec_perc) print("\n(passed!)") print(""" in addition to the tests above, this cell will include some hidden tests. you will only know the result when you submit your solution to the autograder. """) ### ### autograder test - do not remove ### exercise 5 (3 points) now, we've seen how much the home advantage plays in, let us see how the results have looked like in the previous tournaments, for the specific case of the fifa world cup matches. in particular, complete the function, points_table(winners_df, wc_year), below, so that it does the following: · it should take as input a dataframe, winners_df, having a "winner" column like that produced in exercise 3, as well as a target year, wc_year. · it should consider only games in tolerance),="" "second="" team="" should="" be="" {}="" with="" ratio="" of="" {}.="" \="" got="" {}="" with="" ratio="" of="" {}".format("brazil",="" .773,="" sec_team,="" sec_perc)="" ="" print("\n(passed!)")="" ="" print("""="" in="" addition="" to="" the="" tests="" above,="" this="" cell="" will="" include="" some="" hidden="" tests.="" you="" will="" only="" know="" the="" result="" when="" you="" submit="" your="" solution="" to="" the="" autograder.="" """)="" ="" ###="" ###="" autograder="" test="" -="" do="" not="" remove="" ###="" ="" exercise="" 5 (3="" points)="" now,="" we've="" seen="" how="" much="" the="" home="" advantage="" plays="" in,="" let="" us="" see="" how="" the="" results="" have="" looked="" like="" in="" the="" previous="" tournaments,="" for="" the="" specific="" case="" of="" the="" fifa="" world="" cup="" matches.="" in="" particular,="" complete="" the="" function, points_table(winners_df,="" wc_year),="" below,="" so="" that="" it="" does="" the="" following:="" ·="" it="" should="" take="" as="" input="" a="" dataframe, winners_df,="" having="" a="" "winner"="" column="" like="" that="" produced="" in="" exercise="" 3,="" as="" well="" as="" a="" target="" year, wc_year.="" ·="" it="" should="" consider="" only="" games="">= tolerance), "second team should be {} with ratio of {}. \ got {} with ratio of {}".format("brazil", .773, sec_team, sec_perc) print("\n(passed!)") print(""" in addition to the tests above, this cell will include some hidden tests. you will only know the result when you submit your solution to the autograder. """) ### ### autograder test - do not remove ### exercise 5 (3 points) now, we've seen how much the home advantage plays in, let us see how the results have looked like in the previous tournaments, for the specific case of the fifa world cup matches. in particular, complete the function, points_table(winners_df, wc_year), below, so that it does the following: · it should take as input a dataframe, winners_df, having a "winner" column like that produced in exercise 3, as well as a target year, wc_year. · it should consider only games in>