Requirements has to be done inJupiter Notebook You are required to use the dataset contained within the file “us-names-by-decade.csv”,which contains the following features:  Gender - Gender of...

1 answer below »

Requirements


has to be done inJupiter Notebook


You are required to use the dataset contained within the file “us-names-by-decade.csv”,which contains the following features:




  •  Gender - Gender of Individual – (M) Male, (F)Female




  •  Name – First Name of Individual




  •  Decade – 10 Year Period – 1990 = 1990-1999




  •  Count – Number of Individuals that were given the name detailed in the Name


    feature in the decade indicated in the Decade feature eg “F”, “Olivia”, ”2010”, “69799” = The number of Females named Olivia in the years 2010 -2019 was 69799


    and then perform the following analysis:






  1. You are then required to explain what you plan on doing with the data. E.g., Why did you choose the specific visualizations, etcThis must be detailed in the Mark- up of the Jupyter Notebook and include the rational for your choice.




  2. Generate a plot that details the top 5 number of Names for each of the Decades.




  3. Plot a graph depicting the distribution of the Names that are Female in decade 1980.




  4. Find out and visualize which decade had the MOST names.




  5. Observe and visualize the average number of names per decade.


    No additional output will be graded.


    You must complete ALL data exploration PROGRAMMATICALLY and not using any other tool than python.







Answered 2 days AfterMar 05, 2021

Answer To: Requirements has to be done inJupiter Notebook You are required to use the dataset contained within...

Neha answered on Mar 07 2021
150 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"filename = \"us-names-by-decade.csv\"\n",
"df = pd.read_csv(filename)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
">M\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"
gendernamedecadecount
0FSophia201085720
1MJacob201079359
2FIsabella201079238
3FEmma201077736
4Mason201070808
\n",
"
"
],
"text/plain": [
" gender name decade count\n",
"0 F Sophia 2010 85720\n",
"1 M Jacob 2010 79359\n",
"2 F Isabella 2010 79238\n",
"3 F Emma 2010 77736\n",
"4 M Mason 2010 70808"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
gendernamedecadecount
0FSophia201085720
1MJacob201079359
2FIsabella201079238
3FEmma201077736
4MMason201070808
14151MJacob2000273591
14152MMichael2000250318
14153MJoshua2000231729
14154FEmily2000223565
14155MMatthew2000221369
32013MMichael1990462265
32014MChristopher1990360170
32015MMatthew1990351569
32016MJoshua1990329072
32017FJessica1990303053
47733MMichael1980663445
47734MChristopher1980554725
47735FJessica1980469415
47736MMatthew1980458831
47737FJennifer1980440818
60168MMichael1970707704
60169FJennifer1970581756
60170MChristopher1970475681
60171MJason1970462926
60172MDavid1970445967
69919MMichael1960833395
69920MDavid1960734176
69921MJohn1960713636
69922MJames1960684985
69923MRobert1960650985
77302MJames1950843189
77303MMichael1950836913
77304MRobert1950829819
77305MJohn1950797331
77306MDavid1950769391
84003MJames1940795557
84004MRobert1940757894
84005MJohn1940711411
84006FMary1940639971
84007MWilliam1940556286
89923MRobert1930590599
89924FMary1930572868
89925MJames1930547275
89926MJohn1930487777
89927MWilliam1930416559
95806FMary1920701709
95807MRobert1920576322
95808MJohn1920564033
95809MJames1920515296
95810MWilliam1920512373
102705FMary1910478634
102706MJohn1910376321
102707MWilliam1910303027
102708MJames1910275075
102709FHelen1910248150
\n",
"
"
],
"text/plain": [
" gender name decade count\n",
"0 F Sophia 2010 85720\n",
"1 M Jacob 2010 79359\n",
"2 F Isabella 2010 79238\n",
"3 F Emma 2010 77736\n",
"4 M Mason 2010 70808\n",
"14151 M Jacob 2000 273591\n",
"14152 M Michael 2000 250318\n",
"14153 M Joshua 2000 231729\n",
"14154 F Emily 2000 223565\n",
"14155 M Matthew 2000 221369\n",
"32013 M Michael 1990 462265\n",
"32014 M Christopher 1990 360170\n",
"32015 M Matthew 1990 351569\n",
"32016 M Joshua 1990 329072\n",
"32017 F Jessica 1990 303053\n",
"47733 M Michael 1980 663445\n",
"47734 M Christopher 1980 554725\n",
"47735 F Jessica 1980 469415\n",
"47736 M Matthew 1980 458831\n",
"47737 F Jennifer 1980 440818\n",
"60168 M Michael 1970 707704\n",
"60169 F Jennifer 1970 581756\n",
"60170 M Christopher 1970 475681\n",
"60171 M Jason 1970 462926\n",
"60172 M David 1970 445967\n",
"69919 M Michael 1960 833395\n",
"69920 M David 1960 734176\n",
"69921 M John 1960 713636\n",
"69922 M James 1960 684985\n",
"69923 M Robert 1960 650985\n",
"77302 M James 1950 843189\n",
"77303 M Michael 1950 836913\n",
"77304 M Robert 1950 829819\n",
"77305 M John 1950 797331\n",
"77306 M David 1950 769391\n",
"84003 M James 1940 795557\n",
"84004 M Robert 1940 757894\n",
"84005 M John 1940 711411\n",
"84006 F Mary 1940 639971\n",
"84007 M William 1940 556286\n",
"89923 M Robert 1930 590599\n",
"89924 F Mary 1930 572868\n",
"89925 M James 1930 547275\n",
"89926 M John 1930 487777\n",
"89927 M William 1930 416559\n",
"95806 F Mary 1920 701709\n",
"95807 M Robert 1920 576322\n",
"95808 M John 1920 564033\n",
"95809 M James 1920 515296\n",
"95810 M William 1920 512373\n",
"102705 F Mary 1910 478634\n",
"102706 M John 1910 376321\n",
"102707 M William 1910 303027\n",
"102708 M James 1910 275075\n",
"102709 F Helen 1910 248150"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"df_dec_group = df.groupby(\"decade\", sort= True)\n",
"df_dec_group.head()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[['Mary', 'John', 'William', 'James', 'Helen'], ['Mary', 'Robert', 'John', 'James', 'William'], ['Robert', 'Mary', 'James', 'John', 'William'], ['James', 'Robert', 'John', 'Mary', 'William'], ['James', 'Michael', 'Robert', 'John', 'David'], ['Michael', 'David', 'John', 'James', 'Robert'], ['Michael', 'Jennifer', 'Christopher', 'Jason', 'David'], ['Michael', 'Christopher', 'Jessica', 'Matthew', 'Jennifer'], ['Michael', 'Christopher', 'Matthew', 'Joshua', 'Jessica'], ['Jacob', 'Michael', 'Joshua', 'Emily', 'Matthew'], ['Sophia', 'Jacob', 'Isabella', 'Emma', 'Mason']]\n",
"[[478634, 376321, 303027, 275075, 248150], [701709, 576322, 564033, 515296, 512373], [590599, 572868, 547275, 487777, 416559], [795557, 757894, 711411, 639971, 556286], [843189, 836913, 829819, 797331, 769391], [833395, 734176, 713636, 684985, 650985], [707704, 581756, 475681, 462926, 445967], [663445, 554725, 469415, 458831, 440818], [462265, 360170, 351569, 329072, 303053], [273591, 250318, 231729, 223565, 221369], [85720, 79359, 79238, 77736, 70808]]\n"
]
}
],
"source": [
"names = []\n",
"lst = []\n",
"for i in range(1910,2020, 10):\n",
" df_split = df_dec_group.get_group(i).nlargest(5, \"count\")\n",
" names.append(list(df_split['name']))\n",
" lst.append(list(df_split['count']))\n",
"\n",
"print(names)\n",
"print(lst)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Mary', 'Mary', 'Robert', 'James', 'James', 'Michael', 'Michael', 'Michael', 'Michael', 'Jacob', 'Sophia', 'John', 'Robert', 'Mary', 'Robert', 'Michael', 'David', 'Jennifer', 'Christopher', 'Christopher', 'Michael', 'Jacob', 'William', 'John', 'James', 'John', 'Robert', 'John', 'Christopher', 'Jessica', 'Matthew', 'Joshua', 'Isabella', 'James', 'James', 'John', 'Mary', 'John', 'James', 'Jason', 'Matthew', 'Joshua', 'Emily', 'Emma', 'Helen', 'William', 'William', 'William', 'David', 'Robert', 'David', 'Jennifer', 'Jessica', 'Matthew', 'Mason']\n"
]
}
],
"source": [
"#flatten the names list, since it is nested list\n",
"flatten_names = []\n",
"for j in range(len(names[0])):\n",
" for i in range(len(names)):\n",
" flatten_names.append(names[i][j])\n",
"print(flatten_names)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt \n",
"import numpy as np \n",
" \n",
"# create data \n",
"x = np.array(list(range(1910, 2020, 10)))\n",
"y1 = [lst[i][0] for i in range(0, 11)]\n",
"y2 = [lst[i][1] for i in range(0, 11)]\n",
"y3 = [lst[i][2] for i in range(0, 11)]\n",
"y4 = [lst[i][3] for i in range(0, 11)]\n",
"y5 = [lst[i][4] for i in range(0, 11)]\n",
"\n",
"width = 1.5"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here