Using Python 3 Pandas I am trying to read the attached csv find the most popular genres, the correlation between year and revenue, budget and revenue and use matplot to display these relationships

1 answer below »
Using Python 3 Pandas I am trying to read the attached csv find the most popular genres, the correlation between year and revenue, budget and revenue and use matplot to display these relationships
Answered Same DayApr 14, 2021

Answer To: Using Python 3 Pandas I am trying to read the attached csv find the most popular genres, the...

Pushpendra answered on Apr 15 2021
163 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Aim 1:**\n",
"To read the provided csv and know about the data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
idimdb_idpopularitybudgetrevenueoriginal_titlecasthomepagedirectortagline...overviewruntimegenresproduction_companiesrelease_datevote_countvote_averagerelease_yearbudget_adjrevenue_adj
0135397tt036961032.9857631500000001513528810Jurassic WorldChris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...http://www.jurassicworld.com/Colin TrevorrowThe park is open....Twenty-two years after the events of Jurassic ...124Action|Adventure|Science Fiction|ThrillerUniversal Studios|Amblin Entertainment|Legenda...6/9/1555626.520151.379999e+081.392446e+09
\n",
"

1 rows × 21 columns

\n",
"
"
],
"text/plain": [
" id imdb_id popularity budget revenue original_title \\\n",
"0 135397 tt0369610 32.985763 150000000 1513528810 Jurassic World \n",
"\n",
" cast \\\n",
"0 Chris Pratt|Bryce Dallas How
ard|Irrfan Khan|Vi... \n",
"\n",
" homepage director tagline ... \\\n",
"0 http://www.jurassicworld.com/ Colin Trevorrow The park is open. ... \n",
"\n",
" overview runtime \\\n",
"0 Twenty-two years after the events of Jurassic ... 124 \n",
"\n",
" genres \\\n",
"0 Action|Adventure|Science Fiction|Thriller \n",
"\n",
" production_companies release_date vote_count \\\n",
"0 Universal Studios|Amblin Entertainment|Legenda... 6/9/15 5562 \n",
"\n",
" vote_average release_year budget_adj revenue_adj \n",
"0 6.5 2015 1.379999e+08 1.392446e+09 \n",
"\n",
"[1 rows x 21 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"df=pd.read_csv('tmdb-movies-fn2tqcxx.csv')\n",
"df.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 10866 entries, 0 to 10865\n",
"Data columns (total 21 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 id 10866 non-null int64 \n",
" 1 imdb_id 10856 non-null object \n",
" 2 popularity 10866 non-null float64\n",
" 3 budget 10866 non-null int64 \n",
" 4 revenue 10866 non-null int64 \n",
" 5 original_title 10866 non-null object \n",
" 6 cast 10790 non-null object \n",
" 7 homepage 2936 non-null object \n",
" 8 director 10822 non-null object \n",
" 9 tagline 8042 non-null object \n",
" 10 keywords 9373 non-null object \n",
" 11 overview 10862 non-null object \n",
" 12 runtime 10866 non-null int64 \n",
" 13 genres 10843 non-null object \n",
" 14 production_companies 9836 non-null object \n",
" 15 release_date 10866 non-null object \n",
" 16 vote_count 10866 non-null int64 \n",
" 17 vote_average 10866 non-null float64\n",
" 18 release_year 10866 non-null int64 \n",
" 19 budget_adj 10866 non-null float64\n",
" 20 revenue_adj 10866 non-null float64\n",
"dtypes: float64(4), int64(6), object(11)\n",
"memory usage: 1.7+ MB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(10866, 21)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape #to know the number of rows and columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Aim 2:**\n",
"To find the most popular genere."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" original_title genres\n",
"0 Jurassic World Action|Adventure|Science Fiction|Thriller\n",
"1 Mad Max: Fury Road Action|Adventure|Science Fiction|Thriller\n",
"2 Insurgent Adventure|Science Fiction|Thriller\n",
"3 Star Wars: The Force Awakens Action|Adventure|Science Fiction|Fantasy\n",
"4 Furious 7 Action|Crime|Thriller\n",
"... ... ...\n",
"10861 The Endless Summer Documentary\n",
"10862 Grand Prix Action|Adventure|Drama\n",
"10863 Beregis Avtomobilya Mystery|Comedy\n",
"10864 What's Up, Tiger Lily? Action|Comedy\n",
"10865 Manos: The Hands of Fate Horror\n",
"\n",
"[10866 rows x 2 columns]\n"
]
}
],
"source": [
"print(df[['original_title', 'genres']])"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"original_title \n",
"Jurassic World 0 Action\n",
" 1 Adventure\n",
" 2 Science Fiction\n",
" 3 Thriller\n",
"Mad Max: Fury Road 0 Action\n",
" ... \n",
"Beregis Avtomobilya 0 Mystery\n",
" 1 Comedy\n",
"What's Up, Tiger Lily? 0 Action\n",
" 1 Comedy\n",
"Manos: The Hands of Fate 0 Horror\n",
"Length: 26960, dtype: object"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cleaned = df.set_index('original_title').genres.str.split('|', expand=True).stack()\n",
"cleaned"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
g_Actiong_Adventureg_Animationg_Comedyg_Crimeg_Documentaryg_Dramag_Familyg_Fantasyg_Foreigng_Historyg_Horrorg_Musicg_Mysteryg_Romanceg_Science Fictiong_TV Movieg_Thrillerg_Warg_Western
original_title
$5 a Day00010010000000000000
$9.9900100010000000000000
'7110000010000000000110
(500) Days of Summer00010010000000100000
(T)Raumschiff Surprise - Periode 100010000000000010000
...............................................................
ì˜í˜•ì œ00000010010000000100
ì‹ ì˜ í•œ 수10001010000000000000
í¬í™” ì†ìœ¼ë¡œ00000000000000000010
형사 Duelist10000000000000000000
하울ë§00000000010001000100
\n",
"

10548 rows × 20 columns

\n",
"
"
],
"text/plain": [
" g_Action g_Adventure g_Animation \\\n",
"original_title \n",
"$5 a Day 0 0 0 \n",
"$9.99 0 0 1 \n",
"'71 1 0 0 \n",
"(500) Days of Summer 0 0 0 \n",
"(T)Raumschiff Surprise - Periode 1 0 0 0 \n",
"... ... ... ... \n",
"ì˜í˜•ì œ 0 0 0 \n",
"ì‹ ì˜ í•œ 수 1 0 0 \n",
"í¬í™” ì†ìœ¼ë¡œ 0 0 0 \n",
"형사 Duelist 1 0 0 \n",
"í•˜ìš¸ë§ 0 0 0 \n",
"\n",
" g_Comedy g_Crime g_Documentary g_Drama \\\n",
"original_title \n",
"$5 a Day 1 0 0 1 \n",
"$9.99 0 0 0 1 \n",
"'71 0 0 0 1 \n",
"(500) Days of Summer 1 0 0 1 \n",
"(T)Raumschiff Surprise - Periode 1 1 0 0 0 \n",
"... ... ... ... ... \n",
"ì˜í˜•ì œ 0 0 0 1 \n",
"ì‹ ì˜ í•œ 수 0 1 0 1 \n",
"í¬í™” ì†ìœ¼ë¡œ 0 0 0 0 \n",
"형사 Duelist 0 0 0 0 \n",
"í•˜ìš¸ë§ 0 0 0 0 \n",
"\n",
" g_Family g_Fantasy g_Foreign g_History \\\n",
"original_title \n",
"$5 a Day 0 0 0 0 \n",
"$9.99 0 0 0 0 \n",
"'71 0 0 0 0 \n",
"(500) Days of Summer 0 0 0 0 \n",
"(T)Raumschiff Surprise - Periode 1 0 0 0 0 \n",
"... ... ... ... ... \n",
"ì˜í˜•ì œ 0 0 1 0 \n",
"ì‹ ì˜ í•œ 수 0 0 0 0 \n",
"í¬í™” ì†ìœ¼ë¡œ 0 0 0 0 \n",
"형사 Duelist 0 0 0 0 \n",
"í•˜ìš¸ë§ 0 0 1 0 \n",
"\n",
" g_Horror g_Music g_Mystery g_Romance \\\n",
"original_title \n",
"$5 a Day 0 0 0 0 \n",
"$9.99 0 0 0 0 \n",
"'71 0 0 0 0 \n",
"(500) Days of Summer 0 0 0 1 \n",
"(T)Raumschiff Surprise - Periode 1 0 0 0 0 \n",
"... ... ... ... ... \n",
"ì˜í˜•ì œ 0 0 0 0 \n",
"ì‹ ì˜ í•œ 수 0 0 0 0 \n",
"í¬í™” ì†ìœ¼ë¡œ 0 0 0 0 \n",
"형사 Duelist 0 0 0 0 \n",
"í•˜ìš¸ë§ 0 0 1 0 \n",
"\n",
" g_Science Fiction g_TV Movie g_Thriller \\\n",
"original_title \n",
"$5 a Day 0 0 0 \n",
"$9.99 0 0 0 \n",
"'71 0 0 1 \n",
"(500) Days of Summer 0 0 0 \n",
"(T)Raumschiff Surprise - Periode 1 1 0 0 \n",
"... ... ... ... \n",
"ì˜í˜•ì œ 0 0 1 \n",
"ì‹ ì˜ í•œ 수 0 0 0 \n",
"í¬í™” ì†ìœ¼ë¡œ 0 0 0 \n",
"형사 Duelist 0 0 0 \n",
"í•˜ìš¸ë§ 0 0 1 \n",
"\n",
" g_War g_Western \n",
"original_title \n",
"$5 a Day 0 0 \n",
"$9.99 0 0 \n",
"'71 1 0 \n",
"(500) Days of Summer 0 0 \n",
"(T)Raumschiff Surprise - Periode 1 0 0 \n",
"... ... ... \n",
"ì˜í˜•ì œ 0 0 \n",
"ì‹ ì˜ í•œ 수 0 0 \n",
"í¬í™” ì†ìœ¼ë¡œ 1 0 \n",
"형사 Duelist 0 0 \n",
"í•˜ìš¸ë§ 0 0 \n",
"\n",
"[10548 rows x 20 columns]"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1=pd.get_dummies(cleaned, prefix='g').groupby(level=0).sum()\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"g_Action 2385\n",
"g_Adventure 1471\n",
"g_Animation 699\n",
"g_Comedy 3793\n",
"g_Crime 1355\n",
"g_Documentary 520\n",
"g_Drama 4761\n",
"g_Family 1231\n",
"g_Fantasy 916\n",
"g_Foreign 188\n",
"g_History 334\n",
"g_Horror 1637\n",
"g_Music 408\n",
"g_Mystery 810\n",
"g_Romance 1712\n",
"g_Science Fiction 1230\n",
"g_TV Movie 167\n",
"g_Thriller 2908\n",
"g_War 270\n",
"g_Western 165\n",
"dtype: int64"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2=df1.sum()\n",
"df2"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4761"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.max()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Thus, most popular genre is \"Drama\" with count 4761.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" **Aim 3:**\n",
"To find the correlation between release year and revenue and to plot it using matplotlib."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
revenuerelease_year
015135288102015
13784363542015
22952382012015
320681782252015
415062493602015
.........
1086101966
1086201966
1086301966
1086401966
1086501966
\n",
"

10866 rows × 2 columns

\n",
"
"
],
"text/plain": [
" revenue release_year\n",
"0 1513528810 2015\n",
"1 378436354 2015\n",
"2 295238201 2015\n",
"3 2068178225 2015\n",
"4 1506249360 2015\n",
"... ... ...\n",
"10861 0 1966\n",
"10862 0 1966\n",
"10863 0 1966\n",
"10864 0 1966\n",
"10865 0 1966\n",
"\n",
"[10866 rows x 2 columns]"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df4=df[['revenue', 'release_year']]\n",
"df4"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here