CRS_DEP_TIME,CARRIER,DEP_TIME,DEST,DISTANCE,FL_DATE,FL_NUM,ORIGIN,Weather,DAY_WEEK,DAY_OF_MONTH,TAIL_NUM,Flight Status 1455,OH,1455,JFK,184,01/01/2004,5935,BWI,0,4,1,N940CA,ontime...

1 answer below »
I need help on what code is used to complete these problems. There are 3 problems I need to complete using Python code. The screenshots of the 3 problems are attached as well as the corresponding Excel data sets.


CRS_DEP_TIME,CARRIER,DEP_TIME,DEST,DISTANCE,FL_DATE,FL_NUM,ORIGIN,Weather,DAY_WEEK,DAY_OF_MONTH,TAIL_NUM,Flight Status 1455,OH,1455,JFK,184,01/01/2004,5935,BWI,0,4,1,N940CA,ontime 1640,DH,1640,JFK,213,01/01/2004,6155,DCA,0,4,1,N405FJ,ontime 1245,DH,1245,LGA,229,01/01/2004,7208,IAD,0,4,1,N695BR,ontime 1715,DH,1709,LGA,229,01/01/2004,7215,IAD,0,4,1,N662BR,ontime 1039,DH,1035,LGA,229,01/01/2004,7792,IAD,0,4,1,N698BR,ontime 840,DH,839,JFK,228,01/01/2004,7800,IAD,0,4,1,N687BR,ontime 1240,DH,1243,JFK,228,01/01/2004,7806,IAD,0,4,1,N321UE,ontime 1645,DH,1644,JFK,228,01/01/2004,7810,IAD,0,4,1,N301UE,ontime 1715,DH,1710,JFK,228,01/01/2004,7812,IAD,0,4,1,N328UE,ontime 2120,DH,2129,JFK,228,01/01/2004,7814,IAD,0,4,1,N685BR,ontime 2120,DH,2114,LGA,229,01/01/2004,7924,IAD,0,4,1,N645BR,ontime 1455,DL,1458,JFK,213,01/01/2004,746,DCA,0,4,1,N918DE,ontime 930,DL,932,LGA,214,01/01/2004,1746,DCA,0,4,1,N242DL,ontime 1230,DL,1228,LGA,214,01/01/2004,1752,DCA,0,4,1,N241DL,ontime 1430,DL,1429,LGA,214,01/01/2004,1756,DCA,0,4,1,N242DL,ontime 1730,DL,1728,LGA,214,01/01/2004,1762,DCA,0,4,1,N241DL,ontime 2030,DL,2029,LGA,214,01/01/2004,1768,DCA,0,4,1,N242DL,ontime 1530,MQ,1525,JFK,213,01/01/2004,4752,DCA,0,4,1,N709MQ,ontime 600,MQ,556,JFK,213,01/01/2004,4760,DCA,0,4,1,N717MQ,ontime 1830,MQ,1822,JFK,213,01/01/2004,4784,DCA,0,4,1,N707MQ,ontime 900,MQ,853,LGA,214,01/01/2004,4956,DCA,0,4,1,N737MQ,ontime 1300,MQ,1254,LGA,214,01/01/2004,4964,DCA,0,4,1,N717MQ,ontime 1400,MQ,1356,LGA,214,01/01/2004,4966,DCA,0,4,1,N726MQ,ontime 1500,MQ,1452,LGA,214,01/01/2004,4968,DCA,0,4,1,N724MQ,ontime 1900,MQ,1853,LGA,214,01/01/2004,4976,DCA,0,4,1,N724MQ,ontime 850,UA,841,LGA,229,01/01/2004,846,IAD,0,4,1,N513UA,ontime 900,US,858,LGA,214,01/01/2004,2164,DCA,0,4,1,N709UW,ontime 1100,US,1056,LGA,214,01/01/2004,2168,DCA,0,4,1,N748UW,ontime 1300,US,1253,LGA,214,01/01/2004,2172,DCA,0,4,1,N709UW,ontime 1500,US,1458,LGA,214,01/01/2004,2176,DCA,0,4,1,N748UW,ontime 1700,US,1655,LGA,214,01/01/2004,2180,DCA,0,4,1,N709UW,ontime 2100,US,2055,LGA,214,01/01/2004,2188,DCA,0,4,1,N709UW,ontime 1455,RU,1452,EWR,169,01/01/2004,2403,BWI,0,4,1,N14916,ontime 1720,RU,1710,EWR,169,01/01/2004,2675,BWI,0,4,1,N16954,ontime 1030,RU,1030,EWR,169,01/01/2004,2303,BWI,0,4,1,N26549,ontime 700,RU,656,EWR,169,01/01/2004,2703,BWI,0,4,1,N16954,ontime 1300,CO,1256,EWR,199,01/01/2004,808,DCA,0,4,1,N18611,ontime 1730,CO,1726,EWR,199,01/01/2004,814,DCA,0,4,1,N19357,ontime 840,DH,840,EWR,213,01/01/2004,7299,IAD,0,4,1,N691BR,ontime 1710,DH,1704,EWR,213,01/01/2004,7302,IAD,0,4,1,N691BR,ontime 1245,DH,1245,EWR,213,01/01/2004,7303,IAD,0,4,1,N697BR,ontime 2120,DH,2118,EWR,213,01/01/2004,7304,IAD,0,4,1,N699BR,ontime 1700,RU,1651,EWR,213,01/01/2004,2497,IAD,0,4,1,N12540,ontime 1900,RU,1850,EWR,213,01/01/2004,2385,IAD,0,4,1,N16149,ontime 1525,RU,1521,EWR,199,01/01/2004,2261,DCA,0,4,1,N12564,ontime 1900,RU,1855,EWR,199,01/01/2004,2336,DCA,0,4,1,N21537,ontime 1400,RU,1357,EWR,199,01/01/2004,2216,DCA,0,4,1,N15983,ontime 1515,RU,1508,EWR,213,01/01/2004,2156,IAD,0,4,1,N16149,ontime 1300,RU,1255,EWR,213,01/01/2004,2664,IAD,0,4,1,N12519,ontime 1630,RU,1625,EWR,199,01/01/2004,2181,DCA,0,4,1,N19966,ontime 1455,OH,1455,JFK,184,01/02/2004,5935,BWI,0,5,2,N995CA,ontime 1640,DH,1641,JFK,213,01/02/2004,6155,DCA,0,5,2,N415FJ,ontime 1245,DH,1249,LGA,229,01/02/2004,7208,IAD,0,5,2,N688BR,ontime 1455,DH,1531,LGA,229,01/02/2004,7211,IAD,0,5,2,N665BR,delayed 1715,DH,1712,LGA,229,01/02/2004,7215,IAD,0,5,2,N639BR,ontime 640,DH,645,LGA,229,01/02/2004,7790,IAD,0,5,2,N686BR,ontime 1039,DH,1236,LGA,229,01/02/2004,7792,IAD,0,5,2,N665BR,delayed 840,DH,859,JFK,228,01/02/2004,7800,IAD,0,5,2,N645BR,ontime 1240,DH,1232,JFK,228,01/02/2004,7806,IAD,0,5,2,N332UE,ontime 1455,DH,1455,JFK,228,01/02/2004,7808,IAD,0,5,2,N324UE,ontime 1645,DH,1645,JFK,228,01/02/2004,7810,IAD,0,5,2,N305UE,ontime 1715,DH,1716,JFK,228,01/02/2004,7812,IAD,0,5,2,N322UE,ontime 2120,DH,2305,JFK,228,01/02/2004,7814,IAD,0,5,2,N657BR,delayed 1610,DH,1605,JFK,228,01/02/2004,7816,IAD,0,5,2,N315UE,ontime 2120,DH,2118,LGA,229,01/02/2004,7924,IAD,0,5,2,N709BR,ontime 1455,DL,1458,JFK,213,01/02/2004,746,DCA,0,5,2,N964DL,ontime 930,DL,930,LGA,214,01/02/2004,1746,DCA,0,5,2,N241DL,ontime 1230,DL,1230,LGA,214,01/02/2004,1752,DCA,0,5,2,N225DL,ontime 1430,DL,1427,LGA,214,01/02/2004,1756,DCA,0,5,2,N241DL,ontime 1730,DL,1730,LGA,214,01/02/2004,1762,DCA,0,5,2,N225DL,ontime 2030,DL,2028,LGA,214,01/02/2004,1768,DCA,0,5,2,N241DL,ontime 1530,MQ,1522,JFK,213,01/02/2004,4752,DCA,0,5,2,N720MQ,ontime 600,MQ,552,JFK,213,01/02/2004,4760,DCA,0,5,2,N736MQ,ontime 1830,MQ,1847,JFK,213,01/02/2004,4784,DCA,0,5,2,N727MQ,ontime 900,MQ,852,LGA,214,01/02/2004,4956,DCA,0,5,2,N713MQ,ontime 1100,MQ,1053,LGA,214,01/02/2004,4960,DCA,0,5,2,N708MQ,ontime 1300,MQ,1258,LGA,214,01/02/2004,4964,DCA,0,5,2,N713MQ,ontime 1400,MQ,1402,LGA,214,01/02/2004,4966,DCA,0,5,2,N718MQ,ontime 1500,MQ,1456,LGA,214,01/02/2004,4968,DCA,0,5,2,N708MQ,ontime 850,UA,850,LGA,229,01/02/2004,846,IAD,0,5,2,N556UA,ontime 700,US,657,LGA,214,01/02/2004,2160,DCA,0,5,2,N710UW,ontime 900,US,857,LGA,214,01/02/2004,2164,DCA,0,5,2,N736UW,ontime 1100,US,1058,LGA,214,01/02/2004,2168,DCA,0,5,2,N710UW,ontime 1300,US,1258,LGA,214,01/02/2004,2172,DCA,0,5,2,N736UW,ontime 1500,US,1458,LGA,214,01/02/2004,2176,DCA,0,5,2,N710UW,ontime 1700,US,1655,LGA,214,01/02/2004,2180,DCA,0,5,2,N736UW,ontime 1900,US,1855,LGA,214,01/02/2004,2184,DCA,0,5,2,N710UW,ontime 2100,US,2056,LGA,214,01/02/2004,2188,DCA,0,5,2,N736UW,ontime 1720,RU,1715,EWR,169,01/02/2004,2675,BWI,0,5,2,N19966,ontime 1030,RU,1030,EWR,169,01/02/2004,2303,BWI,0,5,2,N12540,ontime 700,RU,656,EWR,169,01/02/2004,2703,BWI,0,5,2,N16961,ontime 1455,RU,1456,EWR,169,01/02/2004,2403,BWI,0,5,2,N12946,ontime 1730,CO,1727,EWR,199,01/02/2004,814,DCA,0,5,2,N14342,ontime 1300,CO,1301,EWR,199,01/02/2004,808,DCA,0,5,2,N14664,ontime 759,CO,754,EWR,199,01/02/2004,806,DCA,0,5,2,N11641,ontime 840,DH,837,EWR,213,01/02/2004,7299,IAD,0,5,2,N679BR,ontime 1245,DH,1350,EWR,213,01/02/2004,7303,IAD,0,5,2,N686BR,delayed 1430,DH,1512,EWR,213,01/02/2004,7307,IAD,0,5,2,N309UE,delayed 630,DH,629,EWR,213,01/02/2004,7371,IAD,0,5,2,N312UE,ontime 1630,RU,1625,EWR,199,01/02/2004,2181,DCA,0,5,2,N14977,ontime 700,RU,655,EWR,213,01/02/2004,2855,IAD,0,5,2,N13990,ontime 900,RU,858,EWR,199,01/02/2004,2582,DCA,0,5,2,N14907,ontime 700,RU,657,EWR,199,01/02/2004,2761,DCA,0,5,2,N13997,ontime 1700,RU,1650,EWR,213,01/02/2004,2497,IAD,0,5,2,N12528,ontime 1900,RU,1856,EWR,213,01/02/2004,2385,IAD,0,5,2,N11107,ontime 1300,RU,1253,EWR,213,01/02/2004,2692,IAD,0,5,2,N14505,ontime 900,RU,854,EWR,213,01/02/2004,3276,IAD,0,5,2,N16151,ontime 1900,RU,1858,EWR,199,01/02/2004,2336,DCA,0,5,2,N15985,ontime 2100,RU,2050,EWR,199,01/02/2004,2879,DCA,0,5,2,N17108,ontime 1400,RU,1358,EWR,199,01/02/2004,2216,DCA,0,5,2,N13118,ontime 1515,RU,1510,EWR,213,01/02/2004,2156,IAD,0,5,2,N11107,delayed 1525,RU,1519,EWR,199,01/02/2004,2261,DCA,0,5,2,N15574,ontime 1245,DH,1243,LGA,229,01/03/2004,7208,IAD,0,6,3,N688BR,ontime 1715,DH,1738,LGA,229,01/03/2004,7215,IAD,0,6,3,N639BR,ontime 640,DH,640,LGA,229,01/03/2004,7790,IAD,0,6,3,N696BR,ontime 1039,DH,1030,LGA,229,01/03/2004,7792,IAD,0,6,3,N696BR,ontime 840,DH,855,JFK,228,01/03/2004,7800,IAD,0,6,3,N709BR,ontime 1240,DH,1237,JFK,228,01/03/2004,7806,IAD,0,6,3,N327UE,ontime 1455,DH,1455,JFK,228,01/03/2004,7808,IAD,0,6,3,N309UE,ontime 1645,DH,1654,JFK,228,01/03/2004,7810,IAD,0,6,3,N311UE,ontime 1715,DH,1741,JFK,228,01/03/2004,7812,IAD,0,6,3,N327UE,ontime 2120,DH,2213,JFK,228,01/03/2004,7814,IAD,0,6,3,N655BR,delayed 1610,DH,1604,JFK,228,01/03/2004,7816,IAD,0,6,3,N329UE,ontime 2120,DH,2138,LGA,229,01/03/2004,7924,IAD,0,6,3,N688BR,ontime 1455,DL,1505,JFK,213,01/03/2004,746,DCA,0,6,3,N997DL,delayed 830,DL,828,LGA,214,01/03/2004,1744,DCA,0,6,3,N225DL,ontime 1030,DL,1030,LGA,214,01/03/2004,1748,DCA,0,6,3,N242DL,ontime 1230,DL,1230,LGA,214,01/03/2004,1752,DCA,0,6,3,N225DL,ontime 1430,DL,1428,LGA,214,01/03/2004,1756,DCA,0,6,3,N242DL,ontime 1630,DL,1629,LGA,214,01/03/2004,1760,DCA,0,6,3,N225DL,ontime 1830,DL,1829,LGA,214,01/03/2004,1764,DCA,0,6,3,N242DL,ontime 2030,DL,2024,LGA,214,01/03/2004,1768,DCA,0,6,3,N225DL,ontime 1530,MQ,1600,JFK,213,01/03/2004,4752,DCA,0,6,3,N734MQ,delayed 600,MQ,555,JFK,213,01/03/2004,4760,DCA,0,6,3,N712MQ,ontime 1830,MQ,1829,JFK,213,01/03/2004,4784,DCA,0,6,3,N709MQ,ontime 900,MQ,855,LGA,214,01/03/2004,4956,DCA,0,6,3,N739MQ,ontime 1300,MQ,1254,LGA,214,01/03/2004,4964,DCA,0,6,3,N739MQ,ontime 850,UA,849,LGA,229,01/03/2004,846,IAD,0,6,3,N567UA,ontime 700,US,655,LGA,214,01/03/2004,2160,DCA,0,6,3,N760UW,ontime 900,US,858,LGA,214,01/03/2004,2164,DCA,0,6,3,N710UW,ontime 1100,US,1059,LGA,214,01/03/2004,2168,DCA,0,6,3,N760UW,ontime 1300,US,1256,LGA,214,01/03/2004,2172,DCA,0,6,3,N710UW,ontime 1500,US,1500,LGA,214,01/03/2004,2176,DCA,0,6,3,N760UW,ontime 1700,US,1658,LGA,214,01/03/2004,2180,DCA,0,6,3,N710UW,ontime 1900,US,1857,LGA,214,01/03/2004,2184,DCA,0,6,3,N760UW,ontime 1720,RU,1714,EWR,169,01/03/2004,2675,BWI,0,6,3,N15574,ontime 700,RU,655,EWR,169,01/03/2004,2703,BWI,0,6,3,N11536,ontime 1030,RU,1026,EWR,169,01/03/2004,2303,BWI,0,6,3,N14907,ontime 1455,RU,1448,EWR,169,01/03/2004,2267,BWI,0,6,3,N14974,ontime 1300,CO,1255,EWR,199,01/03/2004,808,DCA,0,6,3,N11612,ontime 840,DH,857,EWR,213,01/03/2004,7299,IAD,0,6,3,N693BR,delayed 1710,DH,1705,EWR,213,01/03/2004,7302,IAD,0,6,3,N693BR,ontime 1245,DH,1329,EWR,213,01/03/2004,7303,IAD,0,6
Answered 5 days AfterMay 13, 2021

Answer To: CRS_DEP_TIME,CARRIER,DEP_TIME,DEST,DISTANCE,FL_DATE,FL_NUM,ORIGIN,Weather,DAY_WEEK,DAY_OF_MONTH,TAIL...

Vicky answered on May 18 2021
147 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import itertools\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.tree import DecisionTreeClassifier"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CategorycurrencysellerRatingDurationendDayClosePriceOpenPriceCompetitive?
0Music/Movie/GameUS32495Mon0.010.010
1Music/Movie/GameUS32495Mon0.010.010
2Music/Movie/GameUS32495Mon0.010.010
3Music/Movie/GameUS32495Mon0.010.010
4Music/Movie/GameUS32495Mon0.010.010
\n",
"
"
],
"text/plain": [
" Category currency sellerRating Duration endDay ClosePrice \\\n",
"0 Music/Movie/Game US 3249 5 Mon 0.01 \n",
"1 Music/Movie/Game US 3249 5 Mon 0.01 \n",
"2 Music/Movie/Game US 3249 5 Mon 0.01 \n",
"3 Music/Movie/Game US 3249 5 Mon 0.01 \n",
"4 Music/Movie/Game US 3249 5 Mon 0.01 \n",
"\n",
" OpenPrice Competitive? \n",
"0 0.01 0 \n",
"1 0.01 0 \n",
"2 0.01 0 \n",
"3 0.01 0 \n",
"4 0.01 0 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ebay = pd.read_csv('ebayauctions.csv')\n",
"ebay.head()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 1972 entries, 0 to 1971\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Category 1972 non-null object \n",
" 1 currency 1972 non-null object \n",
" 2 sellerRating 1972 non-null int64 \n",
" 3 Duration 1972 non-null int64 \n"
,
" 4 endDay 1972 non-null object \n",
" 5 ClosePrice 1972 non-null float64\n",
" 6 OpenPrice 1972 non-null float64\n",
" 7 Competitive? 1972 non-null int64 \n",
"dtypes: float64(2), int64(3), object(3)\n",
"memory usage: 123.4+ KB\n"
]
}
],
"source": [
"ebay.info()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Convert variable Duration to categorical variable\n",
"ebay['Duration'] = ebay['Duration'].astype('category')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 1972 entries, 0 to 1971\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Category 1972 non-null object \n",
" 1 currency 1972 non-null object \n",
" 2 sellerRating 1972 non-null int64 \n",
" 3 Duration 1972 non-null category\n",
" 4 endDay 1972 non-null object \n",
" 5 ClosePrice 1972 non-null float64 \n",
" 6 OpenPrice 1972 non-null float64 \n",
" 7 Competitive? 1972 non-null int64 \n",
"dtypes: category(1), float64(2), int64(2), object(3)\n",
"memory usage: 110.1+ KB\n"
]
}
],
"source": [
"ebay.info()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"numeric_var_names=[key for key in dict(ebay.dtypes) if dict(ebay.dtypes)[key] in ['float64', 'int64', 'float32', 'int32']]\n",
"cat_var_names=[key for key in dict(ebay.dtypes) if dict(ebay.dtypes)[key] in ['object', 'O']]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['sellerRating', 'ClosePrice', 'OpenPrice', 'Competitive?']"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numeric_var_names"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Category', 'currency', 'endDay']"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cat_var_names"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
NNMISSSUMMEANMEDIANSTDVARMINP1P5P10P25P50P75P90P95P99MAX
sellerRating1972.00.07020790.03560.2383371853.0005973.0110333.567686e+070.004.7150.00112.100595.00001853.0003380.005702.80022501.000027132.000037727.0
ClosePrice1972.00.071877.636.4490879.99589.4938588.009151e+030.010.011.232.2414.90759.99528.0080.999153.2785507.1867999.0
OpenPrice1972.00.025490.612.9262684.50038.8561491.509800e+030.010.010.010.9901.23004.5009.9924.95049.9900132.5800999.0
Competitive?1972.00.01066.00.5405681.0000.4984782.484802e-010.000.000.000.0000.00001.0001.001.0001.00001.00001.0
\n",
"
"
],
"text/plain": [
" N NMISS SUM MEAN MEDIAN STD \\\n",
"sellerRating 1972.0 0.0 7020790.0 3560.238337 1853.000 5973.011033 \n",
"ClosePrice 1972.0 0.0 71877.6 36.449087 9.995 89.493858 \n",
"OpenPrice 1972.0 0.0 25490.6 12.926268 4.500 38.856149 \n",
"Competitive? 1972.0 0.0 1066.0 0.540568 1.000 0.498478 \n",
"\n",
" VAR MIN P1 P5 P10 P25 P50 \\\n",
"sellerRating 3.567686e+07 0.00 4.71 50.00 112.100 595.0000 1853.000 \n",
"ClosePrice 8.009151e+03 0.01 0.01 1.23 2.241 4.9075 9.995 \n",
"OpenPrice 1.509800e+03 0.01 0.01 0.01 0.990 1.2300 4.500 \n",
"Competitive? 2.484802e-01 0.00 0.00 0.00 0.000 0.0000 1.000 \n",
"\n",
" P75 P90 P95 P99 MAX \n",
"sellerRating 3380.00 5702.800 22501.0000 27132.0000 37727.0 \n",
"ClosePrice 28.00 80.999 153.2785 507.1867 999.0 \n",
"OpenPrice 9.99 24.950 49.9900 132.5800 999.0 \n",
"Competitive? 1.00 1.000 1.0000 1.0000 1.0 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Creating Data audit Report\n",
"def var_summary(x):\n",
" return pd.Series([x.count(), x.isnull().sum(), x.sum(), x.mean(), x.median(), x.std(), x.var(), x.min(), x.dropna().quantile(0.01), x.dropna().quantile(0.05),x.dropna().quantile(0.10),x.dropna().quantile(0.25),x.dropna().quantile(0.50),x.dropna().quantile(0.75), x.dropna().quantile(0.90),x.dropna().quantile(0.95), x.dropna().quantile(0.99),x.max()], \n",
" index=['N', 'NMISS', 'SUM', 'MEAN','MEDIAN', 'STD', 'VAR', 'MIN', 'P1' , 'P5' ,'P10' ,'P25' ,'P50' ,'P75' ,'P90' ,'P95' ,'P99' ,'MAX'])\n",
"\n",
"ebay_num=ebay[numeric_var_names]\n",
"num_summary=ebay_num.apply(lambda x: var_summary(x)).T\n",
"num_summary"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"#Handling Outliers\n",
"def outlier_capping(x):\n",
" x = x.clip(upper=x.quantile(0.99))\n",
" x = x.clip(lower=x.quantile(0.01))\n",
" return x\n",
"\n",
"ebay_num=ebay_num.apply(outlier_capping)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"# An utility function to create dummy variable\n",
"def create_dummies( df, colname ):\n",
" col_dummies = pd.get_dummies(df[colname], prefix=colname, drop_first=True)\n",
" df = pd.concat([df, col_dummies], axis=1)\n",
" df.drop( colname, axis = 1, inplace = True )\n",
" return(df)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
Category_AutomotiveCategory_BooksCategory_Business/IndustrialCategory_Clothing/AccessoriesCategory_Coins/StampsCategory_CollectiblesCategory_ComputerCategory_ElectronicsCategory_EverythingElseCategory_Health/Beauty...Category_SportingGoodsCategory_Toys/Hobbiescurrency_GBPcurrency_USendDay_MonendDay_SatendDay_SunendDay_ThuendDay_TueendDay_Wed
00000000000...0001100000
10000000000...0001100000
20000000000...0001100000
30000000000...0001100000
40000000000...0001100000
\n",
"

5 rows × 25 columns

\n",
"
"
],
"text/plain": [
" Category_Automotive Category_Books Category_Business/Industrial \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"\n",
" Category_Clothing/Accessories Category_Coins/Stamps \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Category_Collectibles Category_Computer Category_Electronics \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"\n",
" Category_EverythingElse Category_Health/Beauty ... \\\n",
"0 0 0 ... \n",
"1 0 0 ... \n",
"2 0 0 ... \n",
"3 0 0 ... \n",
"4 0 0 ... \n",
"\n",
" Category_SportingGoods Category_Toys/Hobbies currency_GBP currency_US \\\n",
"0 0 0 0 1 \n",
"1 0 0 0 1 \n",
"2 0 0 0 1 \n",
"3 0 0 0 1 \n",
"4 0 0 0 1 \n",
"\n",
" endDay_Mon endDay_Sat endDay_Sun endDay_Thu endDay_Tue endDay_Wed \n",
"0 1 0 0 0 0 0 \n",
"1 1 0 0 0 0 0 \n",
"2 1 0 0 0 0 0 \n",
"3 1 0 0 0 0 0 \n",
"4 1 0 0 0 0 0 \n",
"\n",
"[5 rows x 25 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#for c_feature in categorical_features\n",
"ebay_cat=ebay[cat_var_names]\n",
"for c_feature in ['Category', 'currency', 'endDay']:\n",
" ebay_cat = create_dummies(ebay_cat,c_feature)\n",
"ebay_cat.head()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
sellerRatingClosePriceOpenPriceCompetitive?Category_AutomotiveCategory_BooksCategory_Business/IndustrialCategory_Clothing/AccessoriesCategory_Coins/StampsCategory_Collectibles...Category_SportingGoodsCategory_Toys/Hobbiescurrency_GBPcurrency_USendDay_MonendDay_SatendDay_SunendDay_ThuendDay_TueendDay_Wed
03249.00.010.010000000...0001100000
13249.00.010.010000000...0001100000
23249.00.010.010000000...0001100000
33249.00.010.010000000...0001100000
43249.00.010.010000000...0001100000
\n",
"

5 rows × 29 columns

\n",
"
"
],
"text/plain": [
" sellerRating ClosePrice OpenPrice Competitive? Category_Automotive \\\n",
"0 3249.0 0.01 0.01 0 0 \n",
"1 3249.0 0.01 0.01 0 0 \n",
"2 3249.0 0.01 0.01 0 0 \n",
"3 3249.0 0.01 0.01 0 0 \n",
"4 3249.0 0.01 0.01 0 0 \n",
"\n",
" Category_Books Category_Business/Industrial \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Category_Clothing/Accessories Category_Coins/Stamps \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Category_Collectibles ... Category_SportingGoods Category_Toys/Hobbies \\\n",
"0 0 ... 0 0 \n",
"1 0 ... 0 0 \n",
"2 0 ... 0 0 \n",
"3 0 ... 0 0 \n",
"4 0 ... 0 0 \n",
"\n",
" currency_GBP currency_US endDay_Mon endDay_Sat endDay_Sun endDay_Thu \\\n",
"0 0 1 1 0 0 0 \n",
"1 0 1 1 0 0 0 \n",
"2 0 1 1 0 0 0 \n",
"3 0 1 1 0 0 0 \n",
"4 0 1 1 0 0 0 \n",
"\n",
" endDay_Tue endDay_Wed \n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
"[5 rows x 29 columns]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ebay = pd.concat([ebay_num, ebay_cat], axis=1)\n",
"ebay.head()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"# split the ebay into training (60%) and testing (40%)\n",
"\n",
"X = pd.concat([ebay.iloc[:,0:3],ebay.iloc[:,4:]],axis=1)\n",
"y = ebay['Competitive?']\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
sellerRatingClosePriceOpenPriceCategory_AutomotiveCategory_BooksCategory_Business/IndustrialCategory_Clothing/AccessoriesCategory_Coins/StampsCategory_CollectiblesCategory_Computer...Category_SportingGoodsCategory_Toys/Hobbiescurrency_GBPcurrency_USendDay_MonendDay_SatendDay_SunendDay_ThuendDay_TueendDay_Wed
857286.033.821.230000000...0000010000
7111634.05.534.900000000...0000000001
1245595.0110.531.230001000...0000000100
1078266.062.091.230000000...0000000100
13612220.0115.270.990000010...0001010000
..................................................................
11301215.07.997.990000000...0001100000
12942169.09.999.990000000...0001001000
8605701.033.831.230010000...0000100000
14593090.023.3511.050000000...0100000010
112650.07.997.990000000...0001010000
\n",
"

1183 rows × 28 columns

\n",
"
"
],
"text/plain": [
" sellerRating ClosePrice OpenPrice Category_Automotive \\\n",
"857 286.0 33.82 1.23 0 \n",
"711 1634.0 5.53 4.90 0 \n",
"1245 595.0 110.53 1.23 0 \n",
"1078 266.0 62.09 1.23 0 \n",
"1361 2220.0 115.27 0.99 0 \n",
"... ... ... ... ... \n",
"1130 1215.0 7.99 7.99 0 \n",
"1294 2169.0 9.99 9.99 0 \n",
"860 5701.0 33.83 1.23 0 \n",
"1459 3090.0 23.35 11.05 0 \n",
"1126 50.0 7.99 7.99 0 \n",
"\n",
" Category_Books Category_Business/Industrial \\\n",
"857 0 0 \n",
"711 0 0 \n",
"1245 0 0 \n",
"1078 0 0 \n",
"1361 0 0 \n",
"... ... ... \n",
"1130 0 0 \n",
"1294 0 0 \n",
"860 0 1 \n",
"1459 0 0 \n",
"1126 0 0 \n",
"\n",
" Category_Clothing/Accessories Category_Coins/Stamps \\\n",
"857 0 0 \n",
"711 0 0 \n",
"1245 1 0 \n",
"1078 0 0 \n",
"1361 0 0 \n",
"... ... ... \n",
"1130 0 0 \n",
"1294 0 0 \n",
"860 0 0 \n",
"1459 0 0 \n",
"1126 0 0 \n",
"\n",
" Category_Collectibles Category_Computer ... Category_SportingGoods \\\n",
"857 0 0 ... 0 \n",
"711 0 0 ... 0 \n",
"1245 0 0 ... 0 \n",
"1078 0 0 ... 0 \n",
"1361 1 0 ... 0 \n",
"... ... ... ... ... \n",
"1130 0 0 ... 0 \n",
"1294 0 0 ... 0 \n",
"860 0 0 ... 0 \n",
"1459 0 0 ... 0 \n",
"1126 0 0 ... 0 \n",
"\n",
" Category_Toys/Hobbies currency_GBP currency_US endDay_Mon \\\n",
"857 0 0 0 0 \n",
"711 0 0 0 0 \n",
"1245 0 0 0 0 \n",
"1078 0 0 0 0 \n",
"1361 0 0 1 0 \n",
"... ... ... ... ... \n",
"1130 0 0 1 1 \n",
"1294 0 0 1 0 \n",
"860 0 0 0 1 \n",
"1459 1 0 0 0 \n",
"1126 0 0 1 0 \n",
"\n",
" endDay_Sat endDay_Sun endDay_Thu endDay_Tue endDay_Wed \n",
"857 1 0 0 0 0 \n",
"711 0 0 0 0 1 \n",
"1245 0 0 1 0 0 \n",
"1078 0 0 1 0 0 \n",
"1361 1 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"1130 0 0 0 0 0 \n",
"1294 0 1 0 0 0 \n",
"860 0 0 0 0 0 \n",
"1459 0 0 0 1 0 \n",
"1126 1 0 0 0 0 \n",
"\n",
"[1183 rows x 28 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier(max_depth=7, min_samples_leaf=50)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# fit Decision Tree Classifier\n",
"treeclf = DecisionTreeClassifier(min_samples_leaf=50, max_depth=7)\n",
"treeclf.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries for graph viz\n",
"from IPython.display import Image \n",
"from sklearn.tree import export_graphviz\n",
"import pydotplus"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here