Data Mining (50 points) Deliverable: 1. Implement and answer the questions related to data mining (reading, exploring, statistically analyzing the data specified in the document below). Read through...

1 answer below »

Note: Only provide the code & output for the Questions (marked in Red) & not all the other exercises...these are only meant for your practice & will help you answer the questions asked.


Data Mining (50 points) Deliverable: 1. Implement and answer the questions related to data mining (reading, exploring, statistically analyzing the data specified in the document below). Read through the document and run the code also specified in the document. Sequentially executing the statements will help you in answering the questions. 2. This assignment will help you understand the concepts related to materials posted in Module 3 (Data Mining). 3. If you use jupyter then submit the notebook or a pdf of the code and the outputs as in the jupyter. Otherwise if you use environment other than Jupyter, create a word/pdf with the code used to answer each question and the outputs for each. Please indicate the Questions being answered clearly in the document or in the Jupyter notebook as comments. 4. The questions that need to be answered and submitted are marked in Red color in the document. Lab Instructions How to run Python? You can use the python environment on your machine if you already have one and install packages as & when needed. Otherwise you can also run Python code in Jupyter notebooks. (Skills Networks Lab by IBM at the following URL- https://labs.cognitiveclass.ai). This lets you login to a cloud environment that supports R, Scala, Python, swift & many more languages using your social accounts (google, facebook & so on). Select JupyterLab once logged in & then select Python. Which version is used in these labs: Python 2 or Python 3 All of these labs use Python 3. What are "Jupyter Notebooks"? Jupyter Notebooks are a popular data science tool that lets Data Scientists write code, see the results, and describe what's happening -- all in a single document. https://labs.cognitiveclass.ai/ Data Analysis with Python 1. Introduction Data Acquisition There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online. In this section, you will learn how to load a dataset into our Jupyter Notebook. In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value) format. Let's use this dataset as an example to practice data reading. • data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data • data type: csv The Pandas Library is a useful tool that enables us to read various datasets into a data frame; our Jupyter notebook platforms have a built-in Pandas Library so that all we need to do is import Pandas without installing. # import pandas library import pandas as pd Read Data We use pandas.read_csv() function to read the csv file. In the bracket, we put the file path along with a quotation mark, so that pandas will read the file into a data frame from that address. The file path can be either an URL or your local file address. Because the data does not include headers, we can add an argument headers = None inside the read_csv() method, so that pandas will not automatically set the first row as a header. You can also assign the dataset to any variable you create. # Import pandas library import pandas as pd # Read the online file by the URL provides above, and assign it to variable "df" other_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data" df = pd.read_csv(other_path, header=None) After reading the dataset, we can use the dataframe.head(n) method to check the top n rows of the dataframe; where n is an integer. Contrary to dataframe.head(n), dataframe.tail(n) will show you the bottom n rows of the dataframe. https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data # show the first 5 rows using dataframe.head() method print("The first 5 rows of the dataframe") df.head(5) Question #1.1: (3 points) check the bottom 10 rows of data frame "df". Add Headers Take a look at our dataset; pandas automatically set the header by an integer from 0. To better describe our data we can introduce a header, this information is available at: https://archive.ics.uci.edu/ml/datasets/Automobile Thus, we have to add headers manually. Firstly, we create a list "headers" that include all column names in order. Then, we use dataframe.columns = headers to replace the headers by the list we created. # create headers list headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of- doors","body-style", "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type", "num-of-cylinders", "engine-size","fuel- system","bore","stroke","compression-ratio","horsepower", "peak-rpm","city-mpg","highway- mpg","price"] print("headers\n", headers) We replace headers and recheck our data frame df.columns = headers df.head(10) we can drop missing values along the column "price" as follows df.dropna(subset=["price"], axis=0) Now, we have successfully read the raw dataset and add the correct headers into the data frame. Question #1.2: (2 points) Find the name of the columns of the dataframe Save Dataset Correspondingly, Pandas enables us to save the dataset to csv by using the dataframe.to_csv() method, you can add the file path and name along with quotation marks in the brackets. https://archive.ics.uci.edu/ml/datasets/Automobile For example, if you would save the dataframe df as automobile.csv to your local machine, you may use the syntax below: df.to_csv("automobile.csv", index=False) We can also read and save other file formats, we can use similar functions to pd.read_csv() and df.to_csv() for other data formats, the functions are listed in the following table: Read/Save Other Data Formats Data Format Read Save csv pd.read_csv() df.to_csv() json pd.read_json() df.to_json() excel pd.read_excel() df.to_excel() hdf pd.read_hdf() df.to_hdf() sql pd.read_sql() df.to_sql() ... ... ... Basic Insight of Dataset After reading data into Pandas dataframe, it is time for us to explore the dataset. There are several ways to obtain essential insights of the data to help us better understand our dataset. Data Types Data has a variety of types. The main types stored in Pandas dataframes are object, float, int, bool and datetime64. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas: df.dtypes returns a Series with the data type of each column. # check the data type of data frame "df" by .dtypes print(df.dtypes) As a result, as shown above, it is clear to see that the data type of "symboling" and "curb-weight" are int64, "normalized-losses" is object, and "wheel-base" is float64, etc. These data types can be changed; we will learn how to accomplish this in a later module. Describe If we would like to get a statistical summary of each column, such as count, column mean value, column standard deviation, etc. We use the describe method: dataframe.describe() This method will provide various summary statistics, excluding NaN (Not a Number) values. df.describe() This shows the statistical summary of all numeric-typed (int, float) columns. For example, the attribute "symboling" has 205 counts, the mean value of this column is 0.83, the standard deviation is 1.25, the minimum value is -2, 25th percentile is 0, 50th percentile is 1, 75th percentile is 2, and the maximum value is 3. However, what if we would also like to check all the columns including those that are of type object. You can add an argument include = "all" inside the bracket. Let's try it again. # describe all the columns in "df" df.describe(include = "all") Now, it provides the statistical summary of all the columns, including object-typed attributes. We can now see how many unique values, which is the top value and the frequency of top value in the object- typed columns. Some values in the table above show as "NaN", this is because those numbers are not available regarding a particular column type. Question #1.3: (5 points) You can select the columns of a data frame by indicating the name of each column, for example, you can select the three columns as follows: dataframe[[' column 1 ',column 2', 'column 3']] Where "column" is the name of the column, you can apply the method ".describe()" to get the statistics of those columns as follows: dataframe[[' column 1 ',column 2', 'column 3'] ].describe() Apply the method ".describe()" to the columns 'length' and 'compression-ratio'. Info Another method you can use to check your dataset is: dataframe.info() It provides a concise summary of your DataFrame. # look at the info of "df" df.info() Here we are able to see the information of our dataframe, with the top 30 rows and the bottom 30 rows. And, it also shows us the whole data frame has 205 rows and 26 columns in total. 2. Data Wrangling Data Wrangling is the process of converting data from the initial format to a format that may be better for analysis. What is the fuel consumption (L/100k) rate for the diesel car? Import data You can find the "Automobile Data Set" from the following link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data. # Import pandas import pandas as pd import matplotlib.pylab as plt Reading the data set from the URL and adding the related headers. filename = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data" Python list headers containing name of headers headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of- doors","body-style", "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type", "num-of-cylinders", "engine- size","fuel-system","bore","stroke","compression-ratio","horsepower", "peak-rpm","city- mpg","highway-mpg","price"] Use the Pandas method read_csv() to load the data from the web address. Set the parameter "names" equal to the Python list "headers". df = pd.read_csv(filename, names = headers) Use the method head() to display the first five rows of the dataframe. # To see what the data set looks like, we'll use the head() method. df.head() https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85
Answered 2 days AfterApr 20, 2021

Answer To: Data Mining (50 points) Deliverable: 1. Implement and answer the questions related to data mining...

Neha answered on Apr 22 2021
145 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The first 5 rows of the dataframe\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
0123456789...16171819202122232425
03?alfa-romerogasstdtwoconvertiblerwdfront88.6...130mpfi3.472.689.01115000212713495
13?alfa-romerogasstdtwoconvertiblerwdfront88.6...130mpfi3.472.689.01115000212716500
21?alfa-romerogasstdtwohatchbackrwdfront94.5...152mpfi2.683.479.01545000192616500
32164audigasstdfoursedanfwdfront99.8...109mpfi3.193.4010.01025500243013950
42164audigasstdfoursedan4wdfront99.4...136mpfi3.193.408.01155500182217450
\n",
"

5 rows × 26 columns

\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... \\\n",
"0 3 ? alfa-romero gas std two convertible rwd front 88.6 ... \n",
"1 3 ? alfa-romero gas std two convertible rwd front 88.6 ... \n",
"2 1 ? alfa-romero gas std two hatchback rwd front 94.5 ... \n",
"3 2 164 audi gas std four sedan fwd front 99.8 ... \n",
"4 2 164 audi gas std four sedan 4wd front 99.4 ... \n",
"\n",
" 16 17 18 19 20 21 22 23 24 25 \n",
"0 130 mpfi 3.47 2.68 9.0 111 5000 21 27 13495 \n",
"1 130 mpfi 3.47 2.68 9.0 111 5000 21 27 16500 \n",
"2 152 mpfi 2.68 3.47 9.0 154 5000 19 26 16500 \n",
"3 109 mpfi 3.19 3.40 10.0 102 5500 24 30 13950 \n",
"4 136 mpfi 3.19 3.40 8.0 115 5500 18 22 17450 \n",
"\n",
"[5 rows x 26 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"# Read the online file by the URL provides above, and assign it to variable \"df\"\n",
"other_path = \"https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data\"\n",
"df = pd.read_csv(other_path, header=None)\n",
"# show the first 5 rows using dataframe.head() method\n",
"print(\"The first 5 rows of the dataframe\")\n",
"df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The bottom 10 rows of the dataframe\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
0123456789...16171819202122232425
195-174volvogasstdfourwagonrwdfront104.3...141mpfi3.783.159.51145400232813415
196-2103volvogasstdfoursedanrwdfront104.3...141mpfi3.783.159.51145400242815985
197-174volvogasstdfourwagonrwdfront104.3...141mpfi3.783.159.51145400242816515
198-2103volvogasturbofoursedanrwdfront104.3...130mpfi3.623.157.51625100172218420
199-174volvogasturbofourwagonrwdfront104.3...130mpfi3.623.157.51625100172218950
200-195volvogasstdfoursedanrwdfront109.1...141mpfi3.783.159.51145400232816845
201-195volvogasturbofoursedanrwdfront109.1...141mpfi3.783.158.71605300192519045
202-195volvogasstdfoursedanrwdfront109.1...173mpfi3.582.878.81345500182321485
203-195volvodieselturbofoursedanrwdfront109.1...145idi3.013.4023.01064800262722470
204-195volvogasturbofoursedanrwdfront109.1...141mpfi3.783.159.51145400192522625
\n",
"

10 rows × 26 columns

\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... 16 \\\n",
"195 -1 74 volvo gas std four wagon rwd front 104.3 ... 141 \n",
"196 -2 103 volvo gas std four sedan rwd front 104.3 ... 141 \n",
"197 -1 74 volvo gas std four wagon rwd front 104.3 ... 141 \n",
"198 -2 103 volvo gas turbo four sedan rwd front 104.3 ... 130 \n",
"199 -1 74 volvo gas turbo four wagon rwd front 104.3 ... 130 \n",
"200 -1 95 volvo gas std four sedan rwd front 109.1 ... 141 \n",
"201 -1 95 volvo gas turbo four sedan rwd front 109.1 ... 141 \n",
"202 -1 95 volvo gas std four sedan rwd front 109.1 ... 173 \n",
"203 -1 95 volvo diesel turbo four sedan rwd front 109.1 ... 145 \n",
"204 -1 95 volvo gas turbo four sedan rwd front 109.1 ... 141 \n",
"\n",
" 17 18 19 20 21 22 23 24 25 \n",
"195 mpfi 3.78 3.15 9.5 114 5400 23 28 13415 \n",
"196 mpfi 3.78 3.15 9.5 114 5400 24 28 15985 \n",
"197 mpfi 3.78 3.15 9.5 114 5400 24 28 16515 \n",
"198 mpfi 3.62 3.15 7.5 162 5100 17 22 18420 \n",
"199 mpfi 3.62 3.15 7.5 162 5100 17 22 18950 \n",
"200 mpfi 3.78 3.15 9.5 114 5400 23 28 16845 \n",
"201 mpfi 3.78 3.15 8.7 160 5300 19 25 19045 \n",
"202 mpfi 3.58 2.87 8.8 134 5500 18 23 21485 \n",
"203 idi 3.01 3.40 23.0 106 4800 26 27 22470 \n",
"204 mpfi 3.78 3.15 9.5 114 5400 19 25 22625 \n",
"\n",
"[10 rows x 26 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show the bottom 10 rows using dataframe.tail() method\n",
"print(\"The bottom 10 rows of the dataframe\")\n",
"df.tail(10)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"headers\n",
" ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-ofdoors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuelsystem', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highwaympg', 'price']\n"
]
}
],
"source": [
"# create headers list\n",
"headers = [\"symboling\",\"normalized-losses\",\"make\",\"fuel-type\",\"aspiration\", \"num-ofdoors\",\"body-style\", \"drive-wheels\",\"engine-location\",\"wheel-base\",\n",
"\"length\",\"width\",\"height\",\"curb-weight\",\"engine-type\", \"num-of-cylinders\", \"engine-size\",\"fuelsystem\",\"bore\",\"stroke\",\"compression-ratio\",\"horsepower\", \"peak-rpm\",\"city-mpg\",\"highwaympg\",\"price\"]\n",
"print(\"headers\\n\", headers)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
symbolingnormalized-lossesmakefuel-typeaspirationnum-ofdoorsbody-styledrive-wheelsengine-locationwheel-base...engine-sizefuelsystemborestrokecompression-ratiohorsepowerpeak-rpmcity-mpghighwaympgprice
03?alfa-romerogasstdtwoconvertiblerwdfront88.6...130mpfi3.472.689.01115000212713495
13?alfa-romerogasstdtwoconvertiblerwdfront88.6...130mpfi3.472.689.01115000212716500
21?alfa-romerogasstdtwohatchbackrwdfront94.5...152mpfi2.683.479.01545000192616500
32164audigasstdfoursedanfwdfront99.8...109mpfi3.193.4010.01025500243013950
42164audigasstdfoursedan4wdfront99.4...136mpfi3.193.408.01155500182217450
..................................................................
200-195volvogasstdfoursedanrwdfront109.1...141mpfi3.783.159.51145400232816845
201-195volvogasturbofoursedanrwdfront109.1...141mpfi3.783.158.71605300192519045
202-195volvogasstdfoursedanrwdfront109.1...173mpfi3.582.878.81345500182321485
203-195volvodieselturbofoursedanrwdfront109.1...145idi3.013.4023.01064800262722470
204-195volvogasturbofoursedanrwdfront109.1...141mpfi3.783.159.51145400192522625
\n",
"

205 rows × 26 columns

\n",
"
"
],
"text/plain": [
" symboling normalized-losses make fuel-type aspiration \\\n",
"0 3 ? alfa-romero gas std \n",
"1 3 ? alfa-romero gas std \n",
"2 1 ? alfa-romero gas std \n",
"3 2 164 audi gas std \n",
"4 2 164 audi gas std \n",
".. ... ... ... ... ... \n",
"200 -1 95 volvo gas std \n",
"201 -1 95 volvo gas turbo \n",
"202 -1 95 volvo gas std \n",
"203 -1 95 volvo diesel turbo \n",
"204 -1 95 volvo gas turbo \n",
"\n",
" num-ofdoors body-style drive-wheels engine-location wheel-base ... \\\n",
"0 two convertible rwd front 88.6 ... \n",
"1 two convertible rwd front 88.6 ... \n",
"2 two hatchback rwd front 94.5 ... \n",
"3 four sedan fwd front 99.8 ... \n",
"4 four sedan 4wd front 99.4 ... \n",
".. ... ... ... ... ... ... \n",
"200 four sedan rwd front 109.1 ... \n",
"201 four sedan rwd front 109.1 ... \n",
"202 four sedan rwd front 109.1 ... \n",
"203 four sedan rwd front 109.1 ... \n",
"204 four sedan rwd front 109.1 ... \n",
"\n",
" engine-size fuelsystem bore stroke compression-ratio horsepower \\\n",
"0 130 mpfi 3.47 2.68 9.0 111 \n",
"1 130 mpfi 3.47 2.68 9.0 111 \n",
"2 152 mpfi 2.68 3.47 9.0 154 \n",
"3 109 mpfi 3.19 3.40 10.0 102 \n",
"4 136 mpfi 3.19 3.40 8.0 115 \n",
".. ... ... ... ... ... ... \n",
"200 141 mpfi 3.78 3.15 9.5 114 \n",
"201 141 mpfi 3.78 3.15 8.7 160 \n",
"202 173 mpfi 3.58 2.87 8.8 134 \n",
"203 145 idi 3.01 3.40 23.0 106 \n",
"204 141 mpfi 3.78 3.15 9.5 114 \n",
"\n",
" peak-rpm city-mpg highwaympg price \n",
"0 5000 21 27 13495 \n",
"1 5000 21 27 16500 \n",
"2 5000 19 26 16500 \n",
"3 5500 24 30 13950 \n",
"4 5500 18 22 17450 \n",
".. ... ... ... ... \n",
"200 5400 23 28 16845 \n",
"201 5300 19 25 19045 \n",
"202 5500 18 23 21485 \n",
"203 4800 26 27 22470 \n",
"204 5400 19 25 22625 \n",
"\n",
"[205 rows x 26 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns = headers\n",
"df.head(10)\n",
"df.dropna(subset=[\"price\"], axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',\n",
" 'num-ofdoors', 'body-style', 'drive-wheels', 'engine-location',\n",
" 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',\n",
" 'num-of-cylinders', 'engine-size', 'fuelsystem', 'bore', 'stroke',\n",
" 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highwaympg',\n",
" 'price'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Find the name of the columns of the dataframe\n",
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"symboling int64\n",
"normalized-losses object\n",
"make object\n",
"fuel-type object\n",
"aspiration object\n",
"num-ofdoors object\n",
"body-style object\n",
"drive-wheels object\n",
"engine-location object\n",
"wheel-base float64\n",
"length float64\n",
"width float64\n",
"height float64\n",
"curb-weight int64\n",
"engine-type object\n",
"num-of-cylinders object\n",
"engine-size int64\n",
"fuelsystem object\n",
"bore object\n",
"stroke object\n",
"compression-ratio float64\n",
"horsepower object\n",
"peak-rpm object\n",
"city-mpg int64\n",
"highwaympg int64\n",
"price object\n",
"dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"symboling int64\n",
"normalized-losses object\n",
"make object\n",
"fuel-type object\n",
"aspiration object\n",
"num-ofdoors object\n",
"body-style object\n",
"drive-wheels object\n",
"engine-location object\n",
"wheel-base float64\n",
"length float64\n",
"width float64\n",
"height float64\n",
"curb-weight int64\n",
"engine-type object\n",
"num-of-cylinders object\n",
"engine-size int64\n",
"fuelsystem object\n",
"bore object\n",
"stroke object\n",
"compression-ratio float64\n",
"horsepower object\n",
"peak-rpm object\n",
"city-mpg int64\n",
"highwaympg int64\n",
"price object\n",
"dtype: object\n"
]
}
],
"source": [
"# check the data type of data frame \"df\" by .dtypes\n",
"print(df.dtypes)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
symbolingwheel-baselengthwidthheightcurb-weightengine-sizecompression-ratiocity-mpghighwaympg
count205.000000205.000000205.000000205.000000205.000000205.000000205.000000205.000000205.000000205.000000
mean0.83414698.756585174.04926865.90780553.7248782555.565854126.90731710.14253725.21951230.751220
std1.2453076.02177612.3372892.1452042.443522520.68020441.6426933.9720406.5421426.886443
min-2.00000086.600000141.10000060.30000047.8000001488.00000061.0000007.00000013.00000016.000000
25%0.00000094.500000166.30000064.10000052.0000002145.00000097.0000008.60000019.00000025.000000
50%1.00000097.000000173.20000065.50000054.1000002414.000000120.0000009.00000024.00000030.000000
75%2.000000102.400000183.10000066.90000055.5000002935.000000141.0000009.40000030.00000034.000000
max3.000000120.900000208.10000072.30000059.8000004066.000000326.00000023.00000049.00000054.000000
\n",
"
"
],
"text/plain": [
" symboling wheel-base length width height \\\n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 0.834146 98.756585 174.049268 65.907805 53.724878 \n",
"std 1.245307 6.021776 12.337289 2.145204 2.443522 \n",
"min -2.000000 86.600000 141.100000 60.300000 47.800000 \n",
"25% 0.000000 94.500000 166.300000 64.100000 52.000000 \n",
"50% 1.000000 97.000000 173.200000 65.500000 54.100000 \n",
"75% 2.000000 102.400000 183.100000 66.900000 55.500000 \n",
"max 3.000000 120.900000 208.100000 72.300000 59.800000 \n",
"\n",
" curb-weight engine-size compression-ratio city-mpg highwaympg \n",
"count 205.000000 205.000000 205.000000 205.000000 205.000000 \n",
"mean 2555.565854 126.907317 10.142537 25.219512 30.751220 \n",
"std 520.680204 41.642693 3.972040 6.542142 6.886443 \n",
"min 1488.000000 61.000000 7.000000 13.000000 16.000000 \n",
"25% 2145.000000 97.000000 8.600000 19.000000 25.000000 \n",
"50% 2414.000000 120.000000 9.000000 24.000000 30.000000 \n",
"75% 2935.000000 141.000000 9.400000 30.000000 34.000000 \n",
"max 4066.000000 326.000000 23.000000 49.000000 54.000000 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
symbolingnormalized-lossesmakefuel-typeaspirationnum-ofdoorsbody-styledrive-wheelsengine-locationwheel-base...engine-sizefuelsystemborestrokecompression-ratiohorsepowerpeak-rpmcity-mpghighwaympgprice
count205.000000205205205205205205205205205.000000...205.000000205205205205.000000205205205.000000205.000000205
uniqueNaN5222223532NaN...NaN83937NaN6024NaNNaN187
topNaN?toyotagasstdfoursedanfwdfrontNaN...NaNmpfi3.623.40NaN685500NaNNaN?
freqNaN413218516811496120202NaN...NaN942320NaN1937NaNNaN4
mean0.834146NaNNaNNaNNaNNaNNaNNaNNaN98.756585...126.907317NaNNaNNaN10.142537NaNNaN25.21951230.751220NaN
std1.245307NaNNaNNaNNaNNaNNaNNaNNaN6.021776...41.642693NaNNaNNaN3.972040NaNNaN6.5421426.886443NaN
min-2.000000NaNNaNNaNNaNNaNNaNNaNNaN86.600000...61.000000NaNNaNNaN7.000000NaNNaN13.00000016.000000NaN
25%0.000000NaNNaNNaNNaNNaNNaNNaNNaN94.500000...97.000000NaNNaNNaN8.600000NaNNaN19.00000025.000000NaN
50%1.000000NaNNaNNaNNaNNaNNaN