Note: Only provide the code & output for the Questions (marked in Red) & not all the other...

Question

Note: Only provide the code & output for the Questions (marked in Red) & not all the other exercises...these are only meant for your practice & will help you answer the questions asked.Data Mining (50 points)    Deliverable:  1. Implement and answer the questions related to data mining (reading, exploring, statistically  analyzing the data specified in the document below). Read through the document and run the  code also specified in the document. Sequentially executing the statements will help you in  answering the questions.  2. This assignment will help you understand the concepts related to materials posted in Module 3  (Data Mining).  3. If you use jupyter then submit the notebook or a pdf of the code and the outputs as in the  jupyter. Otherwise if you use environment other than Jupyter, create a word/pdf with the code  used to answer each question and the outputs for each. Please indicate the Questions being  answered clearly in the document or in the Jupyter notebook as comments.  4. The questions that need to be answered and submitted are marked in Red color in the  document. Lab Instructions  How to run Python?  You can use the python environment on your machine if you already have one and install  packages as & when needed. Otherwise you can also run Python code in Jupyter notebooks.  (Skills Networks Lab by IBM at the following URL- https://labs.cognitiveclass.ai). This lets you  login to a cloud environment that supports R, Scala, Python, swift & many more languages using  your social accounts (google, facebook & so on). Select JupyterLab once logged in & then select  Python.  Which version is used in these labs: Python 2 or Python 3  All of these labs use Python 3.  What are "Jupyter Notebooks"?  Jupyter Notebooks are a popular data science tool that lets Data Scientists write code, see the  results, and describe what's happening -- all in a single document. https://labs.cognitiveclass.ai/ Data Analysis with Python  1. Introduction Data Acquisition  There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on  your local machine or sometimes online.  In this section, you will learn how to load a dataset into our Jupyter Notebook.  In our case, the Automobile Dataset is an online source, and it is in CSV (comma separated value) format.  Let's use this dataset as an example to practice data reading.  • data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data  • data type: csv  The Pandas Library is a useful tool that enables us to read various datasets into a data frame; our Jupyter  notebook platforms have a built-in Pandas Library so that all we need to do is import Pandas without  installing. # import pandas library  import pandas as pd  Read Data  We use pandas.read_csv() function to read the csv file. In the bracket, we put the file path along with a  quotation mark, so that pandas will read the file into a data frame from that address. The file path can be either  an URL or your local file address.  Because the data does not include headers, we can add an argument headers = None inside  the read_csv() method, so that pandas will not automatically set the first row as a header.  You can also assign the dataset to any variable you create.  # Import pandas library  import pandas as pd # Read the online file by the URL provides above, and assign it to variable "df"  other_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data" df = pd.read_csv(other_path, header=None)   After reading the dataset, we can use the dataframe.head(n) method to check the top n rows of the dataframe;  where n is an integer. Contrary to dataframe.head(n), dataframe.tail(n) will show you the bottom n rows of the  dataframe. https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data # show the first 5 rows using dataframe.head() method  print("The first 5 rows of the dataframe")  df.head(5)  Question #1.1: (3 points)  check the bottom 10 rows of data frame "df".  Add Headers  Take a look at our dataset; pandas automatically set the header by an integer from 0.  To better describe our data we can introduce a header, this information is available  at: https://archive.ics.uci.edu/ml/datasets/Automobile  Thus, we have to add headers manually.  Firstly, we create a list "headers" that include all column names in order. Then, we use dataframe.columns =  headers to replace the headers by the list we created. # create headers list headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of- doors","body-style", "drive-wheels","engine-location","wheel-base",  "length","width","height","curb-weight","engine-type",  "num-of-cylinders", "engine-size","fuel- system","bore","stroke","compression-ratio","horsepower",  "peak-rpm","city-mpg","highway- mpg","price"] print("headers
", headers)    We replace headers and recheck our data frame  df.columns = headers  df.head(10)    we can drop missing values along the column "price" as follows  df.dropna(subset=["price"], axis=0)    Now, we have successfully read the raw dataset and add the correct headers into the data frame.  Question #1.2: (2 points)  Find the name of the columns of the dataframe  Save Dataset  Correspondingly, Pandas enables us to save the dataset to csv by using the dataframe.to_csv() method, you can  add the file path and name along with quotation marks in the brackets.  https://archive.ics.uci.edu/ml/datasets/Automobile For example, if you would save the dataframe df as automobile.csv to your local machine, you may use the  syntax below: df.to_csv("automobile.csv", index=False)    We can also read and save other file formats, we can use similar functions to pd.read_csv() and df.to_csv() for  other data formats, the functions are listed in the following table:  Read/Save Other Data Formats  Data Format Read Save  csv pd.read_csv() df.to_csv()  json pd.read_json() df.to_json()  excel pd.read_excel() df.to_excel()  hdf pd.read_hdf() df.to_hdf()  sql pd.read_sql() df.to_sql()  ... ... ...  Basic Insight of Dataset  After reading data into Pandas dataframe, it is time for us to explore the dataset.  There are several ways to obtain essential insights of the data to help us better understand our dataset.  Data Types  Data has a variety of types.  The main types stored in Pandas dataframes are object, float, int, bool and datetime64. In order to better learn  about each attribute, it is always good for us to know the data type of each column. In Pandas:   df.dtypes    returns a Series with the data type of each column.  # check the data type of data frame "df" by .dtypes  print(df.dtypes)    As a result, as shown above, it is clear to see that the data type of "symboling" and "curb-weight" are int64,  "normalized-losses" is object, and "wheel-base" is float64, etc.  These data types can be changed; we will learn how to accomplish this in a later module.  Describe  If we would like to get a statistical summary of each column, such as count, column mean value, column  standard deviation, etc. We use the describe method: dataframe.describe()    This method will provide various summary statistics, excluding NaN (Not a Number) values.   df.describe()    This shows the statistical summary of all numeric-typed (int, float) columns.  For example, the attribute "symboling" has 205 counts, the mean value of this column is 0.83, the standard  deviation is 1.25, the minimum value is -2, 25th percentile is 0, 50th percentile is 1, 75th percentile is 2, and  the maximum value is 3.  However, what if we would also like to check all the columns including those that are of type object. You can add an argument include = "all" inside the bracket. Let's try it again.  # describe all the columns in "df"   df.describe(include = "all")    Now, it provides the statistical summary of all the columns, including object-typed attributes.  We can now see how many unique values, which is the top value and the frequency of top value in the object- typed columns.  Some values in the table above show as "NaN", this is because those numbers are not available regarding a  particular column type.  Question #1.3: (5 points)  You can select the columns of a data frame by indicating the name of each column, for example, you can select  the three columns as follows:  dataframe[[' column 1 ',column 2', 'column 3']]  Where "column" is the name of the column, you can apply the method ".describe()" to get the statistics of  those columns as follows:  dataframe[[' column 1 ',column 2', 'column 3'] ].describe()  Apply the method ".describe()" to the columns 'length' and 'compression-ratio'. Info  Another method you can use to check your dataset is: dataframe.info()    It provides a concise summary of your DataFrame.   # look at the info of "df"  df.info()    Here we are able to see the information of our dataframe, with the top 30 rows and the bottom 30 rows.  And, it also shows us the whole data frame has 205 rows and 26 columns in total. 2. Data Wrangling    Data Wrangling is the process of converting data from the initial format to a format that  may be better for analysis.  What is the fuel consumption (L/100k) rate for the diesel car?  Import data  You can find the "Automobile Data Set" from the following  link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data. # Import pandas  import pandas as pd  import matplotlib.pylab as plt  Reading the data set from the URL and adding the  related headers.  filename = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"    Python list headers containing name of headers headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of- doors","body-style", "drive-wheels","engine-location","wheel-base",  "length","width","height","curb-weight","engine-type", "num-of-cylinders", "engine- size","fuel-system","bore","stroke","compression-ratio","horsepower", "peak-rpm","city- mpg","highway-mpg","price"]    Use the Pandas method read_csv() to load the data from the web address. Set the  parameter "names" equal to the Python list "headers".  df = pd.read_csv(filename, names = headers)    Use the method head() to display the first five rows of the dataframe. # To see what the data set looks like, we'll use the head() method.  df.head()  https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85

Neha · Accepted Answer

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The first 5 rows of the dataframe
"
     ]
    },
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      0
",
       "      1
",
       "      2
",
       "      3
",
       "      4
",
       "      5
",
       "      6
",
       "      7
",
       "      8
",
       "      9
",
       "      ...
",
       "      16
",
       "      17
",
       "      18
",
       "      19
",
       "      20
",
       "      21
",
       "      22
",
       "      23
",
       "      24
",
       "      25
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      3
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      convertible
",
       "      rwd
",
       "      front
",
       "      88.6
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.47
",
       "      2.68
",
       "      9.0
",
       "      111
",
       "      5000
",
       "      21
",
       "      27
",
       "      13495
",
       "    
",
       "    
",
       "      1
",
       "      3
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      convertible
",
       "      rwd
",
       "      front
",
       "      88.6
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.47
",
       "      2.68
",
       "      9.0
",
       "      111
",
       "      5000
",
       "      21
",
       "      27
",
       "      16500
",
       "    
",
       "    
",
       "      2
",
       "      1
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      hatchback
",
       "      rwd
",
       "      front
",
       "      94.5
",
       "      ...
",
       "      152
",
       "      mpfi
",
       "      2.68
",
       "      3.47
",
       "      9.0
",
       "      154
",
       "      5000
",
       "      19
",
       "      26
",
       "      16500
",
       "    
",
       "    
",
       "      3
",
       "      2
",
       "      164
",
       "      audi
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      fwd
",
       "      front
",
       "      99.8
",
       "      ...
",
       "      109
",
       "      mpfi
",
       "      3.19
",
       "      3.40
",
       "      10.0
",
       "      102
",
       "      5500
",
       "      24
",
       "      30
",
       "      13950
",
       "    
",
       "    
",
       "      4
",
       "      2
",
       "      164
",
       "      audi
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      4wd
",
       "      front
",
       "      99.4
",
       "      ...
",
       "      136
",
       "      mpfi
",
       "      3.19
",
       "      3.40
",
       "      8.0
",
       "      115
",
       "      5500
",
       "      18
",
       "      22
",
       "      17450
",
       "    
",
       "  
",
       "
",
       "5 rows × 26 columns
",
       ""
      ],
      "text/plain": [
       "   0    1            2    3    4     5            6    7      8     9   ...  \
",
       "0   3    ?  alfa-romero  gas  std   two  convertible  rwd  front  88.6  ...   
",
       "1   3    ?  alfa-romero  gas  std   two  convertible  rwd  front  88.6  ...   
",
       "2   1    ?  alfa-romero  gas  std   two    hatchback  rwd  front  94.5  ...   
",
       "3   2  164         audi  gas  std  four        sedan  fwd  front  99.8  ...   
",
       "4   2  164         audi  gas  std  four        sedan  4wd  front  99.4  ...   
",
       "
",
       "    16    17    18    19    20   21    22  23  24     25  
",
       "0  130  mpfi  3.47  2.68   9.0  111  5000  21  27  13495  
",
       "1  130  mpfi  3.47  2.68   9.0  111  5000  21  27  16500  
",
       "2  152  mpfi  2.68  3.47   9.0  154  5000  19  26  16500  
",
       "3  109  mpfi  3.19  3.40  10.0  102  5500  24  30  13950  
",
       "4  136  mpfi  3.19  3.40   8.0  115  5500  18  22  17450  
",
       "
",
       "[5 rows x 26 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd
",
    "# Read the online file by the URL provides above, and assign it to variable "df"
",
    "other_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
",
    "df = pd.read_csv(other_path, header=None)
",
    "# show the first 5 rows using dataframe.head() method
",
    "print("The first 5 rows of the dataframe")
",
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The bottom 10 rows of the dataframe
"
     ]
    },
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      0
",
       "      1
",
       "      2
",
       "      3
",
       "      4
",
       "      5
",
       "      6
",
       "      7
",
       "      8
",
       "      9
",
       "      ...
",
       "      16
",
       "      17
",
       "      18
",
       "      19
",
       "      20
",
       "      21
",
       "      22
",
       "      23
",
       "      24
",
       "      25
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      195
",
       "      -1
",
       "      74
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      wagon
",
       "      rwd
",
       "      front
",
       "      104.3
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      23
",
       "      28
",
       "      13415
",
       "    
",
       "    
",
       "      196
",
       "      -2
",
       "      103
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      104.3
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      24
",
       "      28
",
       "      15985
",
       "    
",
       "    
",
       "      197
",
       "      -1
",
       "      74
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      wagon
",
       "      rwd
",
       "      front
",
       "      104.3
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      24
",
       "      28
",
       "      16515
",
       "    
",
       "    
",
       "      198
",
       "      -2
",
       "      103
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      104.3
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.62
",
       "      3.15
",
       "      7.5
",
       "      162
",
       "      5100
",
       "      17
",
       "      22
",
       "      18420
",
       "    
",
       "    
",
       "      199
",
       "      -1
",
       "      74
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      wagon
",
       "      rwd
",
       "      front
",
       "      104.3
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.62
",
       "      3.15
",
       "      7.5
",
       "      162
",
       "      5100
",
       "      17
",
       "      22
",
       "      18950
",
       "    
",
       "    
",
       "      200
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      23
",
       "      28
",
       "      16845
",
       "    
",
       "    
",
       "      201
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      8.7
",
       "      160
",
       "      5300
",
       "      19
",
       "      25
",
       "      19045
",
       "    
",
       "    
",
       "      202
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      173
",
       "      mpfi
",
       "      3.58
",
       "      2.87
",
       "      8.8
",
       "      134
",
       "      5500
",
       "      18
",
       "      23
",
       "      21485
",
       "    
",
       "    
",
       "      203
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      diesel
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      145
",
       "      idi
",
       "      3.01
",
       "      3.40
",
       "      23.0
",
       "      106
",
       "      4800
",
       "      26
",
       "      27
",
       "      22470
",
       "    
",
       "    
",
       "      204
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      19
",
       "      25
",
       "      22625
",
       "    
",
       "  
",
       "
",
       "10 rows × 26 columns
",
       ""
      ],
      "text/plain": [
       "     0    1      2       3      4     5      6    7      8      9   ...   16  \
",
       "195  -1   74  volvo     gas    std  four  wagon  rwd  front  104.3  ...  141   
",
       "196  -2  103  volvo     gas    std  four  sedan  rwd  front  104.3  ...  141   
",
       "197  -1   74  volvo     gas    std  four  wagon  rwd  front  104.3  ...  141   
",
       "198  -2  103  volvo     gas  turbo  four  sedan  rwd  front  104.3  ...  130   
",
       "199  -1   74  volvo     gas  turbo  four  wagon  rwd  front  104.3  ...  130   
",
       "200  -1   95  volvo     gas    std  four  sedan  rwd  front  109.1  ...  141   
",
       "201  -1   95  volvo     gas  turbo  four  sedan  rwd  front  109.1  ...  141   
",
       "202  -1   95  volvo     gas    std  four  sedan  rwd  front  109.1  ...  173   
",
       "203  -1   95  volvo  diesel  turbo  four  sedan  rwd  front  109.1  ...  145   
",
       "204  -1   95  volvo     gas  turbo  four  sedan  rwd  front  109.1  ...  141   
",
       "
",
       "       17    18    19    20   21    22  23  24     25  
",
       "195  mpfi  3.78  3.15   9.5  114  5400  23  28  13415  
",
       "196  mpfi  3.78  3.15   9.5  114  5400  24  28  15985  
",
       "197  mpfi  3.78  3.15   9.5  114  5400  24  28  16515  
",
       "198  mpfi  3.62  3.15   7.5  162  5100  17  22  18420  
",
       "199  mpfi  3.62  3.15   7.5  162  5100  17  22  18950  
",
       "200  mpfi  3.78  3.15   9.5  114  5400  23  28  16845  
",
       "201  mpfi  3.78  3.15   8.7  160  5300  19  25  19045  
",
       "202  mpfi  3.58  2.87   8.8  134  5500  18  23  21485  
",
       "203   idi  3.01  3.40  23.0  106  4800  26  27  22470  
",
       "204  mpfi  3.78  3.15   9.5  114  5400  19  25  22625  
",
       "
",
       "[10 rows x 26 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# show the bottom 10 rows using dataframe.tail() method
",
    "print("The bottom 10 rows of the dataframe")
",
    "df.tail(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "headers
",
      " ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-ofdoors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuelsystem', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highwaympg', 'price']
"
     ]
    }
   ],
   "source": [
    "# create headers list
",
    "headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-ofdoors","body-style", "drive-wheels","engine-location","wheel-base",
",
    ""length","width","height","curb-weight","engine-type", "num-of-cylinders", "engine-size","fuelsystem","bore","stroke","compression-ratio","horsepower", "peak-rpm","city-mpg","highwaympg","price"]
",
    "print("headers\n", headers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      symboling
",
       "      normalized-losses
",
       "      make
",
       "      fuel-type
",
       "      aspiration
",
       "      num-ofdoors
",
       "      body-style
",
       "      drive-wheels
",
       "      engine-location
",
       "      wheel-base
",
       "      ...
",
       "      engine-size
",
       "      fuelsystem
",
       "      bore
",
       "      stroke
",
       "      compression-ratio
",
       "      horsepower
",
       "      peak-rpm
",
       "      city-mpg
",
       "      highwaympg
",
       "      price
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      3
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      convertible
",
       "      rwd
",
       "      front
",
       "      88.6
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.47
",
       "      2.68
",
       "      9.0
",
       "      111
",
       "      5000
",
       "      21
",
       "      27
",
       "      13495
",
       "    
",
       "    
",
       "      1
",
       "      3
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      convertible
",
       "      rwd
",
       "      front
",
       "      88.6
",
       "      ...
",
       "      130
",
       "      mpfi
",
       "      3.47
",
       "      2.68
",
       "      9.0
",
       "      111
",
       "      5000
",
       "      21
",
       "      27
",
       "      16500
",
       "    
",
       "    
",
       "      2
",
       "      1
",
       "      ?
",
       "      alfa-romero
",
       "      gas
",
       "      std
",
       "      two
",
       "      hatchback
",
       "      rwd
",
       "      front
",
       "      94.5
",
       "      ...
",
       "      152
",
       "      mpfi
",
       "      2.68
",
       "      3.47
",
       "      9.0
",
       "      154
",
       "      5000
",
       "      19
",
       "      26
",
       "      16500
",
       "    
",
       "    
",
       "      3
",
       "      2
",
       "      164
",
       "      audi
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      fwd
",
       "      front
",
       "      99.8
",
       "      ...
",
       "      109
",
       "      mpfi
",
       "      3.19
",
       "      3.40
",
       "      10.0
",
       "      102
",
       "      5500
",
       "      24
",
       "      30
",
       "      13950
",
       "    
",
       "    
",
       "      4
",
       "      2
",
       "      164
",
       "      audi
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      4wd
",
       "      front
",
       "      99.4
",
       "      ...
",
       "      136
",
       "      mpfi
",
       "      3.19
",
       "      3.40
",
       "      8.0
",
       "      115
",
       "      5500
",
       "      18
",
       "      22
",
       "      17450
",
       "    
",
       "    
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "      ...
",
       "    
",
       "    
",
       "      200
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      23
",
       "      28
",
       "      16845
",
       "    
",
       "    
",
       "      201
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      8.7
",
       "      160
",
       "      5300
",
       "      19
",
       "      25
",
       "      19045
",
       "    
",
       "    
",
       "      202
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      std
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      173
",
       "      mpfi
",
       "      3.58
",
       "      2.87
",
       "      8.8
",
       "      134
",
       "      5500
",
       "      18
",
       "      23
",
       "      21485
",
       "    
",
       "    
",
       "      203
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      diesel
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      145
",
       "      idi
",
       "      3.01
",
       "      3.40
",
       "      23.0
",
       "      106
",
       "      4800
",
       "      26
",
       "      27
",
       "      22470
",
       "    
",
       "    
",
       "      204
",
       "      -1
",
       "      95
",
       "      volvo
",
       "      gas
",
       "      turbo
",
       "      four
",
       "      sedan
",
       "      rwd
",
       "      front
",
       "      109.1
",
       "      ...
",
       "      141
",
       "      mpfi
",
       "      3.78
",
       "      3.15
",
       "      9.5
",
       "      114
",
       "      5400
",
       "      19
",
       "      25
",
       "      22625
",
       "    
",
       "  
",
       "
",
       "205 rows × 26 columns
",
       ""
      ],
      "text/plain": [
       "     symboling normalized-losses         make fuel-type aspiration  \
",
       "0            3                 ?  alfa-romero       gas        std   
",
       "1            3                 ?  alfa-romero       gas        std   
",
       "2            1                 ?  alfa-romero       gas        std   
",
       "3            2               164         audi       gas        std   
",
       "4            2               164         audi       gas        std   
",
       "..         ...               ...          ...       ...        ...   
",
       "200         -1                95        volvo       gas        std   
",
       "201         -1                95        volvo       gas      turbo   
",
       "202         -1                95        volvo       gas        std   
",
       "203         -1                95        volvo    diesel      turbo   
",
       "204         -1                95        volvo       gas      turbo   
",
       "
",
       "    num-ofdoors   body-style drive-wheels engine-location  wheel-base  ...  \
",
       "0           two  convertible          rwd           front        88.6  ...   
",
       "1           two  convertible          rwd           front        88.6  ...   
",
       "2           two    hatchback          rwd           front        94.5  ...   
",
       "3          four        sedan          fwd           front        99.8  ...   
",
       "4          four        sedan          4wd           front        99.4  ...   
",
       "..          ...          ...          ...             ...         ...  ...   
",
       "200        four        sedan          rwd           front       109.1  ...   
",
       "201        four        sedan          rwd           front       109.1  ...   
",
       "202        four        sedan          rwd           front       109.1  ...   
",
       "203        four        sedan          rwd           front       109.1  ...   
",
       "204        four        sedan          rwd           front       109.1  ...   
",
       "
",
       "     engine-size  fuelsystem  bore  stroke compression-ratio horsepower  \
",
       "0            130        mpfi  3.47    2.68               9.0        111   
",
       "1            130        mpfi  3.47    2.68               9.0        111   
",
       "2            152        mpfi  2.68    3.47               9.0        154   
",
       "3            109        mpfi  3.19    3.40              10.0        102   
",
       "4            136        mpfi  3.19    3.40               8.0        115   
",
       "..           ...         ...   ...     ...               ...        ...   
",
       "200          141        mpfi  3.78    3.15               9.5        114   
",
       "201          141        mpfi  3.78    3.15               8.7        160   
",
       "202          173        mpfi  3.58    2.87               8.8        134   
",
       "203          145         idi  3.01    3.40              23.0        106   
",
       "204          141        mpfi  3.78    3.15               9.5        114   
",
       "
",
       "     peak-rpm city-mpg highwaympg  price  
",
       "0        5000       21         27  13495  
",
       "1        5000       21         27  16500  
",
       "2        5000       19         26  16500  
",
       "3        5500       24         30  13950  
",
       "4        5500       18         22  17450  
",
       "..        ...      ...        ...    ...  
",
       "200      5400       23         28  16845  
",
       "201      5300       19         25  19045  
",
       "202      5500       18         23  21485  
",
       "203      4800       26         27  22470  
",
       "204      5400       19         25  22625  
",
       "
",
       "[205 rows x 26 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = headers
",
    "df.head(10)
",
    "df.dropna(subset=["price"], axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',
",
       "       'num-ofdoors', 'body-style', 'drive-wheels', 'engine-location',
",
       "       'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',
",
       "       'num-of-cylinders', 'engine-size', 'fuelsystem', 'bore', 'stroke',
",
       "       'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highwaympg',
",
       "       'price'],
",
       "      dtype='object')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Find the name of the columns of the dataframe
",
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "symboling              int64
",
       "normalized-losses     object
",
       "make                  object
",
       "fuel-type             object
",
       "aspiration            object
",
       "num-ofdoors           object
",
       "body-style            object
",
       "drive-wheels          object
",
       "engine-location       object
",
       "wheel-base           float64
",
       "length               float64
",
       "width                float64
",
       "height               float64
",
       "curb-weight            int64
",
       "engine-type           object
",
       "num-of-cylinders      object
",
       "engine-size            int64
",
       "fuelsystem            object
",
       "bore                  object
",
       "stroke                object
",
       "compression-ratio    float64
",
       "horsepower            object
",
       "peak-rpm              object
",
       "city-mpg               int64
",
       "highwaympg             int64
",
       "price                 object
",
       "dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "symboling              int64
",
      "normalized-losses     object
",
      "make                  object
",
      "fuel-type             object
",
      "aspiration            object
",
      "num-ofdoors           object
",
      "body-style            object
",
      "drive-wheels          object
",
      "engine-location       object
",
      "wheel-base           float64
",
      "length               float64
",
      "width                float64
",
      "height               float64
",
      "curb-weight            int64
",
      "engine-type           object
",
      "num-of-cylinders      object
",
      "engine-size            int64
",
      "fuelsystem            object
",
      "bore                  object
",
      "stroke                object
",
      "compression-ratio    float64
",
      "horsepower            object
",
      "peak-rpm              object
",
      "city-mpg               int64
",
      "highwaympg             int64
",
      "price                 object
",
      "dtype: object
"
     ]
    }
   ],
   "source": [
    "# check the data type of data frame "df" by .dtypes
",
    "print(df.dtypes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      symboling
",
       "      wheel-base
",
       "      length
",
       "      width
",
       "      height
",
       "      curb-weight
",
       "      engine-size
",
       "      compression-ratio
",
       "      city-mpg
",
       "      highwaympg
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      count
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "      205.000000
",
       "    
",
       "    
",
       "      mean
",
       "      0.834146
",
       "      98.756585
",
       "      174.049268
",
       "      65.907805
",
       "      53.724878
",
       "      2555.565854
",
       "      126.907317
",
       "      10.142537
",
       "      25.219512
",
       "      30.751220
",
       "    
",
       "    
",
       "      std
",
       "      1.245307
",
       "      6.021776
",
       "      12.337289
",
       "      2.145204
",
       "      2.443522
",
       "      520.680204
",
       "      41.642693
",
       "      3.972040
",
       "      6.542142
",
       "      6.886443
",
       "    
",
       "    
",
       "      min
",
       "      -2.000000
",
       "      86.600000
",
       "      141.100000
",
       "      60.300000
",
       "      47.800000
",
       "      1488.000000
",
       "      61.000000
",
       "      7.000000
",
       "      13.000000
",
       "      16.000000
",
       "    
",
       "    
",
       "      25%
",
       "      0.000000
",
       "      94.500000
",
       "      166.300000
",
       "      64.100000
",
       "      52.000000
",
       "      2145.000000
",
       "      97.000000
",
       "      8.600000
",
       "      19.000000
",
       "      25.000000
",
       "    
",
       "    
",
       "      50%
",
       "      1.000000
",
       "      97.000000
",
       "      173.200000
",
       "      65.500000
",
       "      54.100000
",
       "      2414.000000
",
       "      120.000000
",
       "      9.000000
",
       "      24.000000
",
       "      30.000000
",
       "    
",
       "    
",
       "      75%
",
       "      2.000000
",
       "      102.400000
",
       "      183.100000
",
       "      66.900000
",
       "      55.500000
",
       "      2935.000000
",
       "      141.000000
",
       "      9.400000
",
       "      30.000000
",
       "      34.000000
",
       "    
",
       "    
",
       "      max
",
       "      3.000000
",
       "      120.900000
",
       "      208.100000
",
       "      72.300000
",
       "      59.800000
",
       "      4066.000000
",
       "      326.000000
",
       "      23.000000
",
       "      49.000000
",
       "      54.000000
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "        symboling  wheel-base      length       width      height  \
",
       "count  205.000000  205.000000  205.000000  205.000000  205.000000   
",
       "mean     0.834146   98.756585  174.049268   65.907805   53.724878   
",
       "std      1.245307    6.021776   12.337289    2.145204    2.443522   
",
       "min     -2.000000   86.600000  141.100000   60.300000   47.800000   
",
       "25%      0.000000   94.500000  166.300000   64.100000   52.000000   
",
       "50%      1.000000   97.000000  173.200000   65.500000   54.100000   
",
       "75%      2.000000  102.400000  183.100000   66.900000   55.500000   
",
       "max      3.000000  120.900000  208.100000   72.300000   59.800000   
",
       "
",
       "       curb-weight  engine-size  compression-ratio    city-mpg  highwaympg  
",
       "count   205.000000   205.000000         205.000000  205.000000  205.000000  
",
       "mean   2555.565854   126.907317          10.142537   25.219512   30.751220  
",
       "std     520.680204    41.642693           3.972040    6.542142    6.886443  
",
       "min    1488.000000    61.000000           7.000000   13.000000   16.000000  
",
       "25%    2145.000000    97.000000           8.600000   19.000000   25.000000  
",
       "50%    2414.000000   120.000000           9.000000   24.000000   30.000000  
",
       "75%    2935.000000   141.000000           9.400000   30.000000   34.000000  
",
       "max    4066.000000   326.000000          23.000000   49.000000   54.000000  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      symboling
",
       "      normalized-losses
",
       "      make
",
       "      fuel-type
",
       "      aspiration
",
       "      num-ofdoors
",
       "      body-style
",
       "      drive-wheels
",
       "      engine-location
",
       "      wheel-base
",
       "      ...
",
       "      engine-size
",
       "      fuelsystem
",
       "      bore
",
       "      stroke
",
       "      compression-ratio
",
       "      horsepower
",
       "      peak-rpm
",
       "      city-mpg
",

Data Mining (50 points) Deliverable: 1. Implement and answer the questions related to data mining (reading, exploring, statistically analyzing the data specified in the document below). Read through...

Answer To: Data Mining (50 points) Deliverable: 1. Implement and answer the questions related to data mining...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	0	1	2	3	4	5	6	7	8	9	...	16	17	18	19	20	21	22	23	24	25
0	3	?	alfa-romero	gas	std	two	convertible	rwd	front	88.6	...	130	mpfi	3.47	2.68	9.0	111	5000	21	27	13495
1	3	?	alfa-romero	gas	std	two	convertible	rwd	front	88.6	...	130	mpfi	3.47	2.68	9.0	111	5000	21	27	16500
2	1	?	alfa-romero	gas	std	two	hatchback	rwd	front	94.5	...	152	mpfi	2.68	3.47	9.0	154	5000	19	26	16500
3	2	164	audi	gas	std	four	sedan	fwd	front	99.8	...	109	mpfi	3.19	3.40	10.0	102	5500	24	30	13950
4	2	164	audi	gas	std	four	sedan	4wd	front	99.4	...	136	mpfi	3.19	3.40	8.0	115	5500	18	22	17450

	0	1	2	3	4	5	6	7	8	9	...	16	17	18	19	20	21	22	23	24	25
195	-1	74	volvo	gas	std	four	wagon	rwd	front	104.3	...	141	mpfi	3.78	3.15	9.5	114	5400	23	28	13415
196	-2	103	volvo	gas	std	four	sedan	rwd	front	104.3	...	141	mpfi	3.78	3.15	9.5	114	5400	24	28	15985
197	-1	74	volvo	gas	std	four	wagon	rwd	front	104.3	...	141	mpfi	3.78	3.15	9.5	114	5400	24	28	16515
198	-2	103	volvo	gas	turbo	four	sedan	rwd	front	104.3	...	130	mpfi	3.62	3.15	7.5	162	5100	17	22	18420
199	-1	74	volvo	gas	turbo	four	wagon	rwd	front	104.3	...	130	mpfi	3.62	3.15	7.5	162	5100	17	22	18950
200	-1	95	volvo	gas	std	four	sedan	rwd	front	109.1	...	141	mpfi	3.78	3.15	9.5	114	5400	23	28	16845
201	-1	95	volvo	gas	turbo	four	sedan	rwd	front	109.1	...	141	mpfi	3.78	3.15	8.7	160	5300	19	25	19045
202	-1	95	volvo	gas	std	four	sedan	rwd	front	109.1	...	173	mpfi	3.58	2.87	8.8	134	5500	18	23	21485
203	-1	95	volvo	diesel	turbo	four	sedan	rwd	front	109.1	...	145	idi	3.01	3.40	23.0	106	4800	26	27	22470
204	-1	95	volvo	gas	turbo	four	sedan	rwd	front	109.1	...	141	mpfi	3.78	3.15	9.5	114	5400	19	25	22625

	symboling	wheel-base	length	width	height	curb-weight	engine-size	compression-ratio	city-mpg	highwaympg
count	205.000000	205.000000	205.000000	205.000000	205.000000	205.000000	205.000000	205.000000	205.000000	205.000000
mean	0.834146	98.756585	174.049268	65.907805	53.724878	2555.565854	126.907317	10.142537	25.219512	30.751220
std	1.245307	6.021776	12.337289	2.145204	2.443522	520.680204	41.642693	3.972040	6.542142	6.886443
min	-2.000000	86.600000	141.100000	60.300000	47.800000	1488.000000	61.000000	7.000000	13.000000	16.000000
25%	0.000000	94.500000	166.300000	64.100000	52.000000	2145.000000	97.000000	8.600000	19.000000	25.000000
50%	1.000000	97.000000	173.200000	65.500000	54.100000	2414.000000	120.000000	9.000000	24.000000	30.000000
75%	2.000000	102.400000	183.100000	66.900000	55.500000	2935.000000	141.000000	9.400000	30.000000	34.000000
max	3.000000	120.900000	208.100000	72.300000	59.800000	4066.000000	326.000000	23.000000	49.000000	54.000000