Hello I have a data science assignment that requires you to explore a range of data sets and analyse...

Question

Hello I have a data science assignment that requires you to explore a range of data sets and analyse it by answering a range of questions. The assignment must be completed using Jupyter Notebook and coded in Python. The notebook will need to be neatly presented including a thorough analysis for each question. The questions and information are provided in the Jupyter notebook. I have attached the jupyter notebook and the datafiles. Please do not import any other libraries.

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sport Vouchers Program Analysis\n", "\n", "The goal of this Portfolio task is to explore data from the Federal Government Sport Vouchers program - this is a\n", "program that provides up to two $100 vouchers for kids to participate in organised sport. Here's the [NSW Active Kids page](https://www.service.nsw.gov.au/transaction/apply-active-kids-voucher), there are similar schemes in other states - this data is from South Australia.\n", "\n", "This is an exercise in exploring data and communicating the insights you can gain from it. The source data comes\n", "from the `data.gov.au` website and provides details of all Sport Vouchers that have been redeemed since February in SA 2015 as part of the Sport Voucher program: [Sports Vouchers Data](https://data.gov.au/dataset/ds-sa-14daba50-04ff-46c6-8468-9fa593b9f100/details). This download is provided for you as `sportsvouchersclaimed.csv`.\n", "\n", "To augment this data you can also make use of [ABS SEIFA data by LGA](http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ABS_SEIFA_LGA#) which shows a few measures of Socioeconomic Advantage and Disadvantage for every Local Government Area. This data is provided for you as `ABS_SEIFA_LGA.csv`. This could enable you to answer questions about whether the voucher program is used equally by parents in low, middle and high socioeconomic areas. You might be interested in this if you were concerned that this kind of program might just benifit parents who are already advantaged (they might already be paying for sport so this program wouldn't be helping much).\n", "\n", "Questions:\n", "* Describe the distribution of vouchers by: LGA, Sport - which regions/sports stand out? \n", "* Are some sports more popular in different parts of the state?\n", "* Are any electorates over/under represented in their use of vouchers?\n", "* Is there a relationship between any of the SEIFA measures and voucher use in an LGA?\n", "\n", "A challenge in this task is to display a useful summary of the data given that there are a large number of LGAs and sports involved. Try to avoid long lists and large tables. Think about what plots and tables communicate the main points of your findings. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# read the sports vouchers data\n", "sa_vouchers = pd.read_csv(\"files/sportsvouchersclaimed.csv\")\n", "sa_vouchers.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SEIFA data includes row for each Local Government Area (LGA) but the names of the LGAs have a letter or letters in brackets after the name. To allow us to match this up with the voucher data we remove this and convert to uppercase. \n", "\n", "For each LGA the data includes a number of measures all of which could be useful in your exploration. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# read the SEIFA data, create an LGA column by removing the letters in brackets and converting to uppercase\n", "seifa = pd.read_csv('files/ABS_SEIFA_LGA.csv')\n", "lga = seifa[\"Local Government Areas - 2011\"].str.replace(' \$[ACSRCDMT]+\$', '').str.upper()\n", "seifa['LGA'] = lga\n", "seifa.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since there are many rows per LGA we need to use `pivot_table` to create a new data frame with one row per LGA. Here\n", "is an example of doing this to create a table with the different SCORE measures and the population (URP) field. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LGA_scores = seifa[seifa.MEASURE == 'SCORE'].pivot_table(index=\"LGA\", columns=[\"INDEX_TYPE\"], values=\"Value\")\n", "LGA_scores.head()\n", "LGA_pop = seifa[seifa.MEASURE == 'URP'].pivot_table(index=\"LGA\", columns=[\"INDEX_TYPE\"], values=\"Value\")\n", "LGA_scores['Population'] = LGA_pop.IEO\n", "LGA_scores.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This data frame can then be joined with the vouchers data fram to create one master data frame containing both the voucher data and the SEIFA measures." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sa_vouchers_scores = sa_vouchers.join(LGA_scores, on='Participant_LGA')\n", "sa_vouchers_scores.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Challenge - Queensland\n", "\n", "_Note: this is an extra task that you might take on to get a better grade for your portfolio. You can get a good pass grade without doing this._ \n", "\n", "Queensland has a similar program called [Get Started](https://data.gov.au/dataset/ds-qld-3118838a-d425-48fa-bfc9-bc615ddae44e/details?q=get%20started%20vouchers) and we can retrieve data from their program in a similar format. \n", "\n", "The file [round1-redeemed_get_started_vouchers.csv](files/round1-redeemed_get_started_vouchers.csv) contains records of the vouchers issued in Queensland. The date of this data is not included but the program started in 2015 so it is probably from around then. \n", "\n", "The data includes the LGA of the individual but the name of the activity is slightly different. To do a comparable analysis you would need to map the activity names onto those from South Australia. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "qld_vouchers = pd.read_csv('files/round1-redeemed_get_started_vouchers.csv')\n", "qld_vouchers.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Join the QLD data with the LGA data as before to get population and SIEFA data integrated\n", "qld_vouchers['LGA'] = qld_vouchers['Club Local Government Area Name'].str.replace(' \$[RC]+\$', '').str.upper()\n", "qld_vouchers_scores = qld_vouchers.join(LGA_scores, on='LGA')\n", "qld_vouchers_scores.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 } absseifalga-puje0z22.csv LGA_2011,Local Government Areas - 2011,INDEX_TYPE,Index Type,MEASURE,Measure,Value 10050,Albury (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,SCORE,Score,967 10050,Albury (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MINS,Minimum score for SA1s in area,628 10050,Albury (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MAXS,Maximum score for SA1s in area,1169 10050,Albury (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,URP,Usual resident population,47851 10110,Armidale Dumaresq (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,SCORE,Score,985 10110,Armidale Dumaresq (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MINS,Minimum score for SA1s in area,727 10110,Armidale Dumaresq (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MAXS,Maximum score for SA1s in area,1120 10110,Armidale Dumaresq (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,URP,Usual resident population,24122 10150,Ashfield (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,SCORE,Score,1031 10150,Ashfield (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MINS,Minimum score for SA1s in area,926 10150,Ashfield (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MAXS,Maximum score for SA1s in area,1124 10150,Ashfield (A),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,URP,Usual resident population,41216 10200,Auburn (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,SCORE,Score,944 10200,Auburn (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MINS,Minimum score for SA1s in area,759 10200,Auburn (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,MAXS,Maximum score for SA1s in area,1183 10200,Auburn (C),IRSAD,Index of Relative Socio-economic Advantage and Disadvantage,URP,Usual resident population

portfolio-2-m40q3al3-w44ezv1t.ipynb data-sets-25xaki0k-qxoov0ca.zip

Mohd · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sport Vouchers Program Analysis
",
    "
",
    "The goal of this Portfolio task is to explore data from the Federal Government Sport Vouchers program - this is a
",
    "program that provides up to two $100 vouchers for kids to participate in organised sport. Here's the [NSW Active Kids page](https://www.service.nsw.gov.au/transaction/apply-active-kids-voucher), there are similar schemes in other states - this data is from South Australia.
",
    "
",
    "This is an exercise in exploring data and communicating the insights you can gain from it.  The source data comes
",
    "from the `data.gov.au` website and provides details of all Sport Vouchers that have been redeemed since February  in SA 2015 as part of the Sport Voucher program:  [Sports Vouchers Data](https://data.gov.au/dataset/ds-sa-14daba50-04ff-46c6-8468-9fa593b9f100/details).  This download is provided for you as `sportsvouchersclaimed.csv`.
",
    "
",
    "To augment this data you can also make use of [ABS SEIFA data by LGA](http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ABS_SEIFA_LGA#) which shows a few measures of Socioeconomic Advantage and Disadvantage for every Local Government Area. This data is provided for you as `ABS_SEIFA_LGA.csv`. This could enable you to answer questions about whether the voucher program is used equally by parents in low, middle and high socioeconomic areas.   You might be interested in this if you were concerned that this kind of program might just benifit parents who are already advantaged (they might already be paying for sport so this program wouldn't be helping much).
",
    "
",
    "Questions:
",
    "* Describe the distribution of vouchers by: LGA, Sport - which regions/sports stand out? 
",
    "* Are some sports more popular in different parts of the state?
",
    "* Are any electorates over/under represented in their use of vouchers?
",
    "* Is there a relationship between any of the SEIFA measures and voucher use in an LGA?
",
    "
",
    "A challenge in this task is to display a useful summary of the data given that there are a large number of LGAs and sports involved.  Try to avoid long lists and large tables. Think about what plots and tables communicate the main points of your findings. 
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd
",
    "import numpy as np
",
    "import matplotlib.pyplot as plt
",
    "import seaborn as sns
",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'C:\\Users\\91798\\Downloads'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwd
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      Participant_ID
",
       "      Participant_Suburb
",
       "      Club_Previous_Member
",
       "      Participant_ATSI_Indicator
",
       "      Participant_CALD_Indicator
",
       "      Participant_Sex
",
       "      Voucher_Value
",
       "      Club_Name
",
       "      Voucher_Sport
",
       "      Participant_LGA
",
       "      Participant_Electorate
",
       "      Voucher_Completion_Date
",
       "      Voucher_Claim_Year
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      1000002
",
       "      ABERFOYLE PARK
",
       "      Registered Previously
",
       "      Not ATSI
",
       "      Not CALD
",
       "      Male
",
       "      50
",
       "      Flagstaff Hill Football Club
",
       "      Australian Rules
",
       "      ONKAPARINGA
",
       "      FISHER
",
       "      26/6/15 1:19
",
       "      2015
",
       "    
",
       "    
",
       "      1
",
       "      1000002
",
       "      ABERFOYLE PARK
",
       "      Registered Previously
",
       "      Not ATSI
",
       "      Not CALD
",
       "      Male
",
       "      50
",
       "      Flagstaff Hill Football Club
",
       "      Australian Rules
",
       "      ONKAPARINGA
",
       "      FISHER
",
       "      29/6/16 23:55
",
       "      2016
",
       "    
",
       "    
",
       "      2
",
       "      1000004
",
       "      ABERFOYLE PARK
",
       "      Not Registered Previously
",
       "      Not ATSI
",
       "      Not CALD
",
       "      Female
",
       "      50
",
       "      Hub Gymnastics Club
",
       "      Gymnastics
",
       "      ONKAPARINGA
",
       "      FISHER
",
       "      21/4/16 3:37
",
       "      2016
",
       "    
",
       "    
",
       "      3
",
       "      1000004
",
       "      ABERFOYLE PARK
",
       "      Registered Previously
",
       "      Not ATSI
",
       "      Not CALD
",
       "      Female
",
       "      50
",
       "      Hub Gymnastics Club
",
       "      Gymnastics
",
       "      ONKAPARINGA
",
       "      FISHER
",
       "      30/3/17 0:47
",
       "      2017
",
       "    
",
       "    
",
       "      4
",
       "      1000005
",
       "      ABERFOYLE PARK
",
       "      Registered Previously
",
       "      Not ATSI
",
       "      Not CALD
",
       "      Female
",
       "      50
",
       "      Blackwood Community Recreation Association
",
       "      Gymnastics
",
       "      ONKAPARINGA
",
       "      FISHER
",
       "      20/7/15 2:50
",
       "      2015
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "   Participant_ID Participant_Suburb       Club_Previous_Member  \
",
       "0         1000002     ABERFOYLE PARK      Registered Previously   
",
       "1         1000002     ABERFOYLE PARK      Registered Previously   
",
       "2         1000004     ABERFOYLE PARK  Not Registered Previously   
",
       "3         1000004     ABERFOYLE PARK      Registered Previously   
",
       "4         1000005     ABERFOYLE PARK      Registered Previously   
",
       "
",
       "  Participant_ATSI_Indicator Participant_CALD_Indicator Participant_Sex  \
",
       "0                   Not ATSI                   Not CALD            Male   
",
       "1                   Not ATSI                   Not CALD            Male   
",
       "2                   Not ATSI                   Not CALD          Female   
",
       "3                   Not ATSI                   Not CALD          Female   
",
       "4                   Not ATSI                   Not CALD          Female   
",
       "
",
       "   Voucher_Value                                   Club_Name  \
",
       "0             50                Flagstaff Hill Football Club   
",
       "1             50                Flagstaff Hill Football Club   
",
       "2             50                         Hub Gymnastics Club   
",
       "3             50                         Hub Gymnastics Club   
",
       "4             50  Blackwood Community Recreation Association   
",
       "
",
       "      Voucher_Sport Participant_LGA Participant_Electorate  \
",
       "0  Australian Rules     ONKAPARINGA                 FISHER   
",
       "1  Australian Rules     ONKAPARINGA                 FISHER   
",
       "2        Gymnastics     ONKAPARINGA                 FISHER   
",
       "3        Gymnastics     ONKAPARINGA                 FISHER   
",
       "4        Gymnastics     ONKAPARINGA                 FISHER   
",
       "
",
       "  Voucher_Completion_Date  Voucher_Claim_Year  
",
       "0            26/6/15 1:19                2015  
",
       "1           29/6/16 23:55                2016  
",
       "2            21/4/16 3:37                2016  
",
       "3            30/3/17 0:47                2017  
",
       "4            20/7/15 2:50                2015  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read the sports vouchers data
",
    "sa_vouchers = pd.read_csv("sportsvouchersclaimed.csv")
",
    "sa_vouchers.head()
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Describe the distribution of vouchers by: LGA, Sport - which regions/sports stand out?
",
    "#Are some sports more popular in different parts of the state?
",
    "#Region Wise
",
    "#Highest average voucher value {Adelade,BURNSIDE}
",
    "#LOwest average voucher value{Streaky Way,Wuddina}
",
    "#MInimum voucher value by Sport
",
    "#{Ice Scatting, Shooting, Cycling}
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.set_style('darkgrid')
",
    "sns.barplot(x='Participant_LGA',y='Voucher_Value',data=sa_vouchers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       ""
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png":

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sport Vouchers Program Analysis\n", "\n", "The goal of this Portfolio task is to explore data from the Federal Government Sport...

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sport Vouchers Program...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	Participant_ID	Participant_Suburb	Club_Previous_Member	Participant_ATSI_Indicator	Participant_CALD_Indicator	Participant_Sex	Voucher_Value	Club_Name	Voucher_Sport	Participant_LGA	Participant_Electorate	Voucher_Completion_Date	Voucher_Claim_Year
0	1000002	ABERFOYLE PARK	Registered Previously	Not ATSI	Not CALD	Male	50	Flagstaff Hill Football Club	Australian Rules	ONKAPARINGA	FISHER	26/6/15 1:19	2015
1	1000002	ABERFOYLE PARK	Registered Previously	Not ATSI	Not CALD	Male	50	Flagstaff Hill Football Club	Australian Rules	ONKAPARINGA	FISHER	29/6/16 23:55	2016
2	1000004	ABERFOYLE PARK	Not Registered Previously	Not ATSI	Not CALD	Female	50	Hub Gymnastics Club	Gymnastics	ONKAPARINGA	FISHER	21/4/16 3:37	2016
3	1000004	ABERFOYLE PARK	Registered Previously	Not ATSI	Not CALD	Female	50	Hub Gymnastics Club	Gymnastics	ONKAPARINGA	FISHER	30/3/17 0:47	2017
4	1000005	ABERFOYLE PARK	Registered Previously	Not ATSI	Not CALD	Female	50	Blackwood Community Recreation Association	Gymnastics	ONKAPARINGA	FISHER	20/7/15 2:50	2015