Machine Learning with PythonSIT720 Machine Learning Assessment Task 1: Problem solving task. ...

Question

Machine Learning with PythonSIT720 Machine Learning   Assessment Task 1: Problem solving task.    ©Deakin University                                                                  1                                                                   SIT720  This document supplies detailed information on Assessment Task 1 for this unit. Key information   • Due: Monday 26 July 2021 by 8.00 pm (AEST)  • Weighting: 5% Learning Outcomes   This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning  Outcomes (GLO):   Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO)    ULO1 - Perform Python programming to solve a  given problem.  GLO1 - through the assessment of student ability to  use data acquisition.  techniques to obtain, manipulate and represent data.  GLO2 - through the assessment of communicating  the results in specific format.  GLO3 - through student ability to use specific  programming language and modules to obtain, pre- process, transform and analyse data. Purpose   This assessment task is for student to apply Python programming skills for loading, visualising, manipulating  and exporting data using various modules and packages.    Assessment 1                                        Total marks = 30                                        15 * 2 = 30    Submission Instructions   a) Submit your solution into a notebook file with “.ipynb” extension.   b) Insert your Python code or text responses into the cell of your submitted file followed by the question i.e.,  copy the question by adding a cell before the solution cell. If you need multiple cells for better  presentation of the code or answer, add question only before the first solution cell.  c) Your submitted code should be executable. If your code does not generate the submitted solution, then  you will get zero for that part of the marks.   d) For answers regarding discussion or explanation, maximum five sentences are                 suggested.  e) Answers must be relevant and precise.   f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.  g) Submit your assignment after running each cell individually with the output.  h) The submitted notebook file name should be of this form “SIT720_A1_studentID.ipynb”. For example, if   your student ID is 1234, then the submitted file name should be SIT720_A1_1234.ipynb.    Background  According to World Health Organization (WHO), cardiovascular diseases  (CVDs) are the number 1 cause of  death globally, taking an estimated 17.9 million lives each year and affecting the quality of life of a large  number of people worldwide. Prerequisites of the treatment of these types of disease involve proper diagnosis  method to identify its occurrence and its type.    SIT720 Machine Learning   Assessment Task 1: Problem solving task.    ©Deakin University                                                                  2                                                                   SIT720  Diagnosis of such diseases always involve a large number of parameters to help the Cardiologists to identify  them. Electrocardiography (ECG) is most commonly used to observe patient's cardiac states. The following  image shows different ECG waves (P, Q, R, S, T, etc). Fig: Electrocardiogram trace with respective biomarkers. [Image source: https://litfl.com/wp- content/uploads/2018/10/ECG-waves-segments-and- intervals-LITFL-ECG-library-3.jpg]  In this assignment, you will have a look at such a dataset containing different parameters along with the decision  of the Cardiologists about the level of a sample heart disease. There will be a list of tasks to check your ability to  use of programming skill, basic logics, and reasoning. Dataset  Dataset file name: A1_heart_disease_dataset.csv    Dataset description: Dataset contains different features along with the disease state. It contains total 13 features  and an additional disease state, in total 14 columns. It contains different types of data including int, float and string.  Feature names, data type and values are described in the following section with their proper unit details. Data may  contain 'null' or 'nan' values. Each observation is a datapoint along the row of the dataset. Patient and observation  are used interchangeably in this case.    Features:  i. age (int): age of the patient in year  ii. sex (str): gender of the patient (M: male, F: female)  iii. cp (str): chest pain type (tap: typical angina, aap: atypical angina, nap: non-anginal pain, asp:  asymptomatic pain)  iv. trestbps (float): resting blood pressure (in mm Hg on admission to the hospital)  v. chol (float): serum cholesterol in mg/dl  vi. fbs (bool): is fasting blood sugar higher than standard 120 mg/dl? (yes: if true; no: if false)  vii. restecg (int): resting electrocardiographic results (0: normal, 1: having ST-T wave abnormality (T  wave inversions and/or ST elevation or depression of > 0.05 mV), 2: showing probable or definite  left ventricular hypertrophy by Estes' criteria)  viii. thalach (float): maximum heart rate achieved  ix. exang (bool): exercise induced angina (true: if yes; false: if no)  x. oldpeak (float): ST interval depression induced by exercise relative to rest  %5bImage%20source:%20https:/litfl.com/wp-content/uploads/2018/10/ECG-waves-segments-and-%20intervals-LITFL-ECG-library-3.jpg%5d %5bImage%20source:%20https:/litfl.com/wp-content/uploads/2018/10/ECG-waves-segments-and-%20intervals-LITFL-ECG-library-3.jpg%5d   SIT720 Machine Learning   Assessment Task 1: Problem solving task.    ©Deakin University                                                                  3                                                                   SIT720  xi. slope (float): the slope of the peak exercise ST segment (1: up-sloping, 2: flat, 3: down- sloping)  xii. ca (int): number of major vessels (0-3) affected  xiii. thal (int): thalassemia state (3: normal; 6: fixed defect; 7: reversable defect)  xiv. state (int): heart disease risk state (0: no disease, 1-4: level of risk) [Decided by cardiologists]  _____________________________________________________________________________________  Questions  _____________________________________________________________________________________  1. Load the data from supplied data file. Remove the observations/samples where the heart diseases are not  diagnosed by the Cardiologists. Print the data dimension before and after removing the  observations/samples.    2. Continue from question 1. Display the number of rows and their indices that have missing data in one or  more cells. Now, replace the missing data by the lowest value of the corresponding feature if it is a  continuous variable. In case of categorical variable, remove the sample. Print the median values of all  features before and after replacing missing data.     3. Continue from question 2. Is there any change in data type? If yes, convert them back to appropriate data  types. Print all variables with corresponding data type.     4. Continue from question 3. Print the total numbers and ration of male and female patients who are at  highest risk of heart disease.     5. Continue from question 3. Is there any association between heart rate and severity of heart disease?  Explain your results from given dataset.    6. Continue from question 3. Print the average cholesterol level for different number of blocked blood  vessels across gender. Please report the pattern found in the result, if any.    7. Print the percentage of patients at risk of heart disease having abnormality in both ECG and blood sugar  with asymptomatic chest pain.    8. Calculate and print the average blood pressure of all observations with non-flat ST slopes of ECG.    9. Create and print a dataframe of the heart rate, blood pressure and cholesterol levels for different age  groups (based on 10 years interval).    10. Continue from question 3. Find the average cholesterol level of across gender for each age group. Please  explain the results.    11. Continue from question 3. Draw two scatter plots of cholesterol level, one against blood pressure and  another against heart rate. Draw them in two subplots of the same plot.     12. Visualize the cholesterol level against number of blood vessel blocked for male and female using line  plot. Explain the graph base on your observation.    13. Draw a group bar diagram of heart rate, blood pressure and total number of patients, based on age  groups defined in question 9. Explain your observation from the graph.       SIT720 Machine Learning   Assessment Task 1: Problem solving task

Atal Behari · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "## *Importing Modules*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd
",
    "import numpy as np
",
    "import seaborn as sns
",
    "import matplotlib.pyplot as plt
",
    "import openpyxl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "## *Question no.: 1. Load the data from supplied data file. Remove the observations/samples where the heart diseases are not diagnosed by the Cardiologists. Print the data dimension before and after removing the observations/samples.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "    age sex   cp  trestbps   chol  fbs  restecg  thalach  exang  oldpeak  \
0  63.0   M  tap     145.0  233.0  yes      2.0    150.0    0.0      2.3   
1  67.0   M  asp     160.0  286.0   no      2.0    108.0    1.0      1.5   
2  67.0   M  asp     120.0  229.0   no      2.0    129.0    1.0      2.6   
3  37.0   M  nap     130.0  250.0   no      0.0    187.0    0.0      3.5   
4  41.0   F  aap     130.0  204.0   no      2.0    172.0    0.0      1.4   
5  56.0   M  aap     120.0  236.0   no      0.0    178.0    0.0      0.8   
6  62.0   F  NaN     140.0  268.0   no      2.0    160.0    0.0      3.6   
7  57.0   F  asp     120.0  354.0   no      0.0    163.0    1.0      0.6   
8  63.0   M  asp     130.0  254.0   no      2.0    147.0    0.0      1.4   
9  53.0   M  asp     140.0  203.0  yes      2.0    155.0    1.0      3.1

slope   ca  thal  state  
0    3.0  0.0   6.0    0.0  
1    2.0  3.0   3.0    2.0  
2    2.0  2.0   7.0    1.0  
3    3.0  0.0   3.0    0.0  
4    1.0  0.0   3.0    0.0  
5    1.0  0.0   3.0    0.0  
6    3.0  2.0   3.0    3.0  
7    1.0  0.0   3.0    0.0  
8    2.0  1.0   7.0    2.0  
9    3.0  0.0   7.0    1.0  ",
      "text/html": "

.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

.dataframe tbody tr th {
        vertical-align: top;
    }

.dataframe thead th {
        text-align: right;
    }

age
      sex
      cp
      trestbps
      chol
      fbs
      restecg
      thalach
      exang
      oldpeak
      slope
      ca
      thal
      state

0
      63.0
      M
      tap
      145.0
      233.0
      yes
      2.0
      150.0
      0.0
      2.3
      3.0
      0.0
      6.0
      0.0

1
      67.0
      M
      asp
      160.0
      286.0
      no
      2.0
      108.0
      1.0
      1.5
      2.0
      3.0
      3.0
      2.0

2
      67.0
      M
      asp
      120.0
      229.0
      no
      2.0
      129.0
      1.0
      2.6
      2.0
      2.0
      7.0
      1.0

3
      37.0
      M
      nap
      130.0
      250.0
      no
      0.0
      187.0
      0.0
      3.5
      3.0
      0.0
      3.0
      0.0

4
      41.0
      F
      aap
      130.0
      204.0
      no
      2.0
      172.0
      0.0
      1.4
      1.0
      0.0
      3.0
      0.0

5
      56.0
      M
      aap
      120.0
      236.0
      no
      0.0
      178.0
      0.0
      0.8
      1.0
      0.0
      3.0
      0.0

6
      62.0
      F
      NaN
      140.0
      268.0
      no
      2.0
      160.0
      0.0
      3.6
      3.0
      2.0
      3.0
      3.0

7
      57.0
      F
      asp
      120.0
      354.0
      no
      0.0
      163.0
      1.0
      0.6
      1.0
      0.0
      3.0
      0.0

8
      63.0
      M
      asp
      130.0
      254.0
      no
      2.0
      147.0
      0.0
      1.4
      2.0
      1.0
      7.0
      2.0

9
      53.0
      M
      asp
      140.0
      203.0
      yes
      2.0
      155.0
      1.0
      3.1
      3.0
      0.0
      7.0
      1.0

"
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('C:/Users/Atal/PycharmProjects/GreyNodes/Dataset/HeartDiseaseDataset.csv')
",
    "data.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "* ### *Data dimension before removing the observations/samples.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "(303, 14)"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "
",
      "RangeIndex: 303 entries, 0 to 302
",
      "Data columns (total 14 columns):
",
      " #   Column    Non-Null Count  Dtype  
",
      "---  ------    --------------  -----  
",
      " 0   age       295 non-null    float64
",
      " 1   sex       295 non-null    object 
",
      " 2   cp        292 non-null    object 
",
      " 3   trestbps  295 non-null    float64
",
      " 4   chol      293 non-null    float64
",
      " 5   fbs       295 non-null    object 
",
      " 6   restecg   295 non-null    float64
",
      " 7   thalach   291 non-null    float64
",
      " 8   exang     295 non-null    float64
",
      " 9   oldpeak   295 non-null    float64
",
      " 10  slope     295 non-null    float64
",
      " 11  ca        291 non-null    float64
",
      " 12  thal      295 non-null    float64
",
      " 13  state     293 non-null    float64
",
      "dtypes: float64(11), object(3)
",
      "memory usage: 33.3+ KB
"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "* ### *Removing the observations/samples where the heart diseases are not diagnosed by the Cardiologists.*
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [],
   "source": [
    "data_copy = data.copy()   # making a copy of dataset
",
    "data = data.dropna(subset = ['state'])
"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "* ### *Data dimension after removing the observations/samples.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "data": {
      "text/plain": "(293, 14)"
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.shape"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "## *Question no.: 2. Continue from question 1. Display the number of rows and their indices that have missing data in one or more cells. Now, replace the missing data by the lowest value of the corresponding feature if it is a continuous variable. In case of categorical variable, remove the sample. Print the median values of all features before and after replacing missing data.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of observations containing NULL values:  9
",
      "Indexes of observations containing NULL values:  [6, 115, 116, 132, 218, 223, 228, 290, 302]
"
     ]
    }
   ],
   "source": [
    "# Display the number of rows and their indices that have missing data
",
    "
",
    "count = 0
",
    "list_index = []
",
    "a = data.isnull().sum(axis = 1) #Return a Series, containing the number of NaN values in each string
",
    "for i in a.index:
",
    "    if a[i] != 0:
",
    "        count = count + 1
",
    "        list_index.append(i)
",
    "
",
    "print('Number of observations containing NULL values: ',count)
",
    "print('Indexes of observations containing NULL values: ',list_index)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "age          55.0
",
      "trestbps    130.0
",
      "chol        243.0
",
      "restecg       1.0
",
      "thalach     153.0
",
      "exang         0.0
",
      "oldpeak       0.7
",
      "slope         2.0
",
      "ca            0.0
",
      "thal          3.0
",
      "state         0.0
",
      "dtype: float64
"
     ]
    }
   ],
   "source": [
    "# Print the median values of all features before replacing missing data.
",
    "
",
    "print(data.median())"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [],
   "source": [
    "# Replace the missing data by the lowest value of the corresponding feature if it is a continuous variable.
",
    "# In case of categorical variable, remove the sample.
",
    "
",
    "data['chol'].fillna(data['chol'].min(),inplace = True)
",
    "data['thalach'].fillna(data['thalach'].min(),inplace = True)
",
    "data['ca'].fillna(data['ca'].min(),inplace = True)
",
    "data = data.dropna(subset = ['cp'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "pycharm": {
     "name": "#%%
"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "age          55.50
",
      "trestbps    130.00
",
      "chol        242.50
",
      "restecg       1.00
",
      "thalach     152.00
",
      "exang         0.00
",
      "oldpeak       0.75
",
      "slope         2.00
",
      "ca            0.00
",
      "thal          3.00
",
      "state         0.00
",
      "dtype: float64
"
     ]
    }
   ],
   "source": [
    "# Print the median values of all features after replacing missing data.
",
    "
",
    "print(data.median())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md
"
    }
   },
   "source": [
    "## *Question no.: 3. Continue from question 2. Is there any change in data type? If yes, convert them back to appropriate data types. Print all variables with corresponding data type.*"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "data_type=data.dtypes
",
    "print(data_type)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   },
   "execution_count": 11,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "age         float64
",
      "sex          object
",
      "cp           object
",
      "trestbps    float64
",
      "chol        float64
",
      "fbs          object
",
      "restecg     float64
",
      "thalach     float64
",
      "exang       float64
",
      "oldpeak     float64
",
      "slope       float64
",
      "ca          float64
",
      "thal        float64
",
      "state       float64
",
      "dtype: object
"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "### *There is no change in the data types.*"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md
"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## *Question no.: 4. Continue from question 3. Print the total numbers and ration of male and female patients who are at highest risk of heart disease.*"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [
    {
     "data": {
      "text/plain": "  sex  state   0
0   M    0.0  87
1   F    0.0  71
2   M    1.0  44
3   M    2.0  28
4   M    3.0  27
5   M    4.0  11
6   F    1.0   9
7   F    2.0   6
8   F    3.0   5
9   F    4.0   2",
      "text/html": "

.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

.dataframe tbody tr th {
        vertical-align: top;
    }

.dataframe thead th {
        text-align: right;
    }

sex
      state
      0

0
      M
      0.0
      87

1
      F
      0.0
      71

2
      M
      1.0
      44

3
      M
      2.0
      28

4
      M
      3.0
      27

5
      M
      4.0
      11

6
      F
      1.0
      9

7
      F
      2.0
      6

8
      F
      3.0
      5

9
      F
      4.0
      2

"
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Gender-wise number of patients for all heart conditions
",
    "
",
    "sex_count=pd.DataFrame(data[['sex','state']].value_counts())
",
    "sex_count.reset_index(inplace=True)
",
    "sex_count"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [
    {
     "data": {
      "text/plain": "sex     F   M
state        
0.0    71  87
1.0     9  44
2.0     6  28
3.0     5  27
4.0     2  11",
      "text/html": "

.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

.dataframe tbody tr th {
        vertical-align: top;
    }

.dataframe thead th {
        text-align: right;
    }

sex
      F
      M

state

0.0
      71
      87

1.0
      9
      44

2.0
      6
      28

3.0
      5
      27

4.0
      2
      11

"
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# state and gender wise no_of_patients
",
    "
",
    "temp=sex_count.pivot(index='state',columns='sex')[0]
",
    "temp"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The total numbers of male and female patients who are at highest risk of heart disease is  11  and  2  respectively.
",
      "Ration of male and female patients who are at highest risk of heart disease is:-
",
      "Male : Female =  11 : 2
"
     ]
    }
   ],
   "source": [
    "print('The total numbers of male and female patients who are at highest risk of heart disease is ',temp.loc[4]['M'],' and ',temp.loc[4]['F'],' respectively.')
",
    "print('Ration of male and female patients who are at highest risk of heart disease is:-')
",
    "print('Male : Female = ',temp.loc[4]['M'],':',temp.loc[4]['F'])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%
"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## *Question no.: 5. Continue from question 3. Is there any association between heart rate and severity of heart disease? Explain your results from given dataset.*"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md
"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "outputs": [
    {
     "data": {
      "text/plain": "[]"
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": "",
      "image/png":

SIT720 Machine Learning Assessment Task 1: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 1 for this unit. Key...

Answer To: SIT720 Machine Learning Assessment Task 1: Problem solving task. ©Deakin University XXXXXXXXXX1...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	ca	thal	state
0	63.0	M	tap	145.0	233.0	yes	2.0	150.0	0.0	2.3	3.0	0.0	6.0	0.0
1	67.0	M	asp	160.0	286.0	no	2.0	108.0	1.0	1.5	2.0	3.0	3.0	2.0
2	67.0	M	asp	120.0	229.0	no	2.0	129.0	1.0	2.6	2.0	2.0	7.0	1.0
3	37.0	M	nap	130.0	250.0	no	0.0	187.0	0.0	3.5	3.0	0.0	3.0	0.0
4	41.0	F	aap	130.0	204.0	no	2.0	172.0	0.0	1.4	1.0	0.0	3.0	0.0
5	56.0	M	aap	120.0	236.0	no	0.0	178.0	0.0	0.8	1.0	0.0	3.0	0.0
6	62.0	F	NaN	140.0	268.0	no	2.0	160.0	0.0	3.6	3.0	2.0	3.0	3.0
7	57.0	F	asp	120.0	354.0	no	0.0	163.0	1.0	0.6	1.0	0.0	3.0	0.0
8	63.0	M	asp	130.0	254.0	no	2.0	147.0	0.0	1.4	2.0	1.0	7.0	2.0
9	53.0	M	asp	140.0	203.0	yes	2.0	155.0	1.0	3.1	3.0	0.0	7.0	1.0

	sex	state	0
0	M	0.0	87
1	F	0.0	71
2	M	1.0	44
3	M	2.0	28
4	M	3.0	27
5	M	4.0	11
6	F	1.0	9
7	F	2.0	6
8	F	3.0	5
9	F	4.0	2

	sex	state	0
0	M	0.0	87
1	F	0.0	71
2	M	1.0	44
3	M	2.0	28
4	M	3.0	27
5	M	4.0	11
6	F	1.0	9
7	F	2.0	6
8	F	3.0	5
9	F	4.0	2

	sex	state	0
0	M	0.0	87
1	F	0.0	71
2	M	1.0	44
3	M	2.0	28
4	M	3.0	27
5	M	4.0	11
6	F	1.0	9
7	F	2.0	6
8	F	3.0	5
9	F	4.0	2