SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 2 for this unit. Key...

1 answer below »
Machine learning using unsupervised algorithms


SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University 1 SIT720 This document supplies detailed information on Assessment Task 2 for this unit. Key information • Due: Week 7, Monday 30 August 2021 by 8.00 pm (AEST), • Weighting: 15% Learning Outcomes This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO): Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO) ULO2 - Perform unsupervised learning of data such as clustering and dimensionality reduction. GLO1 - through the assessment of student ability to use data acquisition techniques to obtain, manipulate and represent data. GLO3 - through student ability to use specific programming language and modules to obtain, pre- process, transform and analyse data. GLO4 -through assessment of student ability to make decisions to obtain data, use appropriate techniques to represent and visualise complex relationships in the data. GLO5 - through assessment of student ability to solve problems relates to ill-defined data. Purpose This assessment task is for student to apply skills for data clustering and dimensionality reduction. Students will be required to demonstrate ability in data representation, and competency in applying suitable clustering/dimensionality reduction techniques in a real-world scenario. Assessment 2 Total marks = 40 Submission Instructions a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and explanations including outputs and figures into a separate file and submit as a PDF file. b) Submission other than the above-mentioned file formats will not be assessed and given zero for the entire submission. c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the question i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for better presentation of the code, add question only before the first solution cell. d) Your submitted code should be executable. If your code does not generate the submitted solution, then you will get zero for that part of the marks. e) Answers must be relevant and precise. f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided. g) Use topics covered till week 6 for answering this assignment. h) Submit your assignment after running each cell individually. i) The submitted notebook file name should be of this form “SIT720_A2_studentID.ipynb”. For example, if your student ID is 1234, then the submitted file name should be “SIT720_A2_1234.ipynb”. SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University 2 SIT720 _____________________________________________________________________________________ Questions _____________________________________________________________________________________ Datafile: Download the dataset (.csv) from the SCADI . Data Description: This dataset contains 206 attributes of 70 children with physical and motor disability based on ICF-CY. For more information click this link. 1. Determine the number of subgroups from the dataset using attributes 3 to 205 i.e., exclude attributes 1, 2 and 206. Is this number same as number of classes presented by attribute 206? Explain and justify your findings. 4 marks 2. Is this data facing curse of dimensionality? If so, then how to solve this problem. Explain with a two- dimensional plot and report relevant loss of information. 4 marks 3. After applying principal component analysis (PCA) on a given dataset, it was found that the percentage of variance for the first N components is X%. How is this percentage of variance computed? 2 marks ___________________________________________________________________________________ Background Obesity has become a global epidemic that has doubled since 1980, with serious consequences for health in children, teenagers, and adults. Obesity levels in individuals may relate to their eating habits and physical condition. In this assessment, you will be analysing and creating ML models based on a given dataset that contains attributes of individuals with relation to obesity levels. Dataset filename: obesity_levels.csv Dataset description: This dataset include data for the estimation of obesity levels in individuals based on their eating habits and physical condition. The data contains 17 attributes and 2111 records. Features and labels: The attribute names are listed below. The description of the attributes can be found in this article (web-link). I. Gender II. Age III. Height IV. Weight V. family_history_with_overweight (family history of overweight) VI. FAVC (frequent high caloric food) VII. FCVC (vegetables per meal) VIII. NCP (number of main meals per day) IX. CAEC (any food between meals) X. SMOKE (smoking) XI. CH2O (daily water intake) XII. SCC (daily consumed calories) XIII. FAF (frequency of physical activity) XIV. TUE (technology usage) XV. CALC (consumption of alcohol) XVI. MTRANS (means of transport) XVII. NObeyesdad (obesity levels, i.e. Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III) _____________________________________________________________________________________ Questions https://archive.ics.uci.edu/ml/datasets/SCADI https://www.mdpi.com/2073-8994/11/1/89/htm https://doi.org/10.1016/j.dib.2019.104344 SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University 3 SIT720 _____________________________________________________________________________________ 4. Create a machine learning (ML) model for predicting “weight” using all features except “NObeyesdad” and report observed performance. Explain your results based on following criteria: 10 marks a. What model have you selected for solving this problem and why? b. Have you made any assumption for the target variable? If so, then why? c. What have you done with text variables? Explain. d. Have you optimised any model parameters? What is the benefit of this action? e. Have you applied any step for handling overfitting or underfitting issue? What is that? 5. Create a ML model for classifying subjects into two classes applying following constraints on above dataset. 12 marks • Use “NObeyesdad” as target variable and rest of them as predictor variables. • drop samples with value “Insufficient Weight” for “NObeyesdad” • Group Normal Weight, Overweight Level I,
Answered 2 days AfterAug 21, 2021

Answer To: SIT720 Machine Learning Assessment Task 2: Problem solving task. ©Deakin University XXXXXXXXXX1...

Karthi answered on Aug 23 2021
145 Votes
89946/machine_learning.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"source": [
"import pandas as pd\n",
"import seaborn as sns\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import collections\n",
"from collections import Counter\n",
"\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"from sklearn.preprocessing import OrdinalEncoder\n",
"from sklearn.preprocessing import OneHotEncoder\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.ensemble import AdaBoostClassifier\n",
"from sklearn.linear_model import SGDClassifier\n",
"\n",
"from sklearn.metrics import accuracy_score\n",
"from sklearn.metrics import classification_report"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 3,
"source": [
"df = pd.read_csv('obesitylevels.csv')"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Gender Age Height Weight family_history_with_overweight \\\n",
"0 Female 21.000000 1.620000 64.000000 yes \n",
"1 Female 21.000000 1.520000 56.000000 yes \n",
"2 Male 23.000000 1.800000 77.000000 yes \n",
"3 Male 27.000000 1.800000 87.000000 no \n",
"4 Male 22.000000 1.780000 89.800000 no \n",
"... ... ... ... ... ... \n",
"2106 Female 20.976842 1.710730 131.408528 yes \n",
"2107 Female 21.982942 1.748584 133.742943 yes \n",
"2108 Female 22.524036 1.752206 133.689352 yes \n",
"2109 Female 24.361936 1.739450 133.346641 yes \n",
"2110 Female 23.664709 1.738836 133.472641 yes \n",
"\n",
" FAVC FCVC NCP CAEC SMOKE CH2O SCC FAF TUE \\\n",
"0 no 2.0 3.0 Sometimes no 2.000000 no 0.000000 1.000000 \n",
"1 no 3.0 3.0 Sometimes yes 3.000000 yes 3.000000 0.000000 \n",
"2 no 2.0 3.0 Sometimes no 2.000000 no 2.000000 1.000000 \n",
"3 no 3.0 3.0 Sometimes no 2.000000 no 2.000000 0.000000 \n",
"4 no 2.0 1.0 Sometimes no 2.000000 no 0.000000 0.000000 \n",
"... ... ... ... ... ... ... ... ... ... \n",
"2106 yes 3.0 3.0 Sometimes no 1.728139 no 1.676269 0.906247 \n",
"2107 yes 3.0 3.0 Sometimes no 2.005130 no 1.341390 0.599270 \n",
"2108 yes 3.0 3.0 Sometimes no 2.054193 no 1.414209 0.646288 \n",
"2109 yes 3.0 3.0 Sometimes no 2.852339 no 1.139107 0.586035 \n",
"2110 yes 3.0 3.0 Sometimes no 2.863513 no 1.026452 0.714137 \n",
"\n",
" CALC MTRANS NObeyesdad \n",
"0 no Public_Transportation Normal_Weight \n",
"1 Sometimes Public_Transportation Normal_Weight \n",
"2 Frequently Public_Transportation Normal_Weight \n",
"3 Frequently Walking Overweight_Level_I \n",
"4 Sometimes Public_Transportation Overweight_Level_II \n",
"... ... ... ... \n",
"2106 Sometimes Public_Transportation Obesity_Type_III \n",
"2107 Sometimes Public_Transportation Obesity_Type_III \n",
"2108 Sometimes Public_Transportation Obesity_Type_III \n",
"2109 Sometimes Public_Transportation Obesity_Type_III \n",
"2110 Sometimes Public_Transportation Obesity_Type_III \n",
"\n",
"[2111 rows x 17 columns]"
],
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
GenderAgeHeightWeightfamily_history_with_overweightFAVCFCVCNCPCAECSMOKECH2OSCCFAFTUECALCMTRANSNObeyesdad
0Female21.0000001.62000064.000000yesno2.03.0Sometimesno2.000000no0.0000001.000000noPublic_TransportationNormal_Weight
1Female21.0000001.52000056.000000yesno3.03.0Sometimesyes3.000000yes3.0000000.000000SometimesPublic_TransportationNormal_Weight
2Male23.0000001.80000077.000000yesno2.03.0Sometimesno2.000000no2.0000001.000000FrequentlyPublic_TransportationNormal_Weight
3Male27.0000001.80000087.000000nono3.03.0Sometimesno2.000000no2.0000000.000000FrequentlyWalkingOverweight_Level_I
4Male22.0000001.78000089.800000nono2.01.0Sometimesno2.000000no0.0000000.000000SometimesPublic_TransportationOverweight_Level_II
......................................................
2106Female20.9768421.710730131.408528yesyes3.03.0Sometimesno1.728139no1.6762690.906247SometimesPublic_TransportationObesity_Type_III
2107Female21.9829421.748584133.742943yesyes3.03.0Sometimesno2.005130no1.3413900.599270SometimesPublic_TransportationObesity_Type_III
2108Female22.5240361.752206133.689352yesyes3.03.0Sometimesno2.054193no1.4142090.646288SometimesPublic_TransportationObesity_Type_III
2109Female24.3619361.739450133.346641yesyes3.03.0Sometimesno2.852339no1.1391070.586035SometimesPublic_TransportationObesity_Type_III
2110Female23.6647091.738836133.472641yesyes3.03.0Sometimesno2.863513no1.0264520.714137SometimesPublic_TransportationObesity_Type_III
\n",
"

2111 rows × 17 columns

\n",
"
"
]
},
"metadata": {},
"execution_count": 4
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 5,
"source": [
"df.shape"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(2111, 17)"
]
},
"metadata": {},
"execution_count": 5
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 6,
"source": [
"df.info()"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"RangeIndex: 2111 entries, 0 to 2110\n",
"Data columns (total 17 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Gender 2111 non-null object \n",
" 1 Age 2111 non-null float64\n",
" 2 Height 2111 non-null float64\n",
" 3 Weight 2111 non-null float64\n",
" 4 family_history_with_overweight 2111 non-null object \n",
" 5 FAVC 2111 non-null object \n",
" 6 FCVC 2111 non-null float64\n",
" 7 NCP 2111 non-null float64\n",
" 8 CAEC 2111 non-null object \n",
" 9 SMOKE 2111 non-null object \n",
" 10 CH2O 2111 non-null float64\n",
" 11 SCC 2111 non-null object \n",
" 12 FAF 2111 non-null float64\n",
" 13 TUE 2111 non-null float64\n",
" 14 CALC 2111 non-null object \n",
" 15 MTRANS 2111 non-null object \n",
" 16 NObeyesdad 2111 non-null object \n",
"dtypes: float64(8), object(9)\n",
"memory usage: 280.5+ KB\n"
]
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 7,
"source": [
"df.describe()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Age Height Weight FCVC NCP \\\n",
"count 2111.000000 2111.000000 2111.000000 2111.000000 2111.000000 \n",
"mean 24.312600 1.701677 86.586058 2.419043 2.685628 \n",
"std 6.345968 0.093305 26.191172 0.533927 0.778039 \n",
"min 14.000000 1.450000 39.000000 1.000000 1.000000 \n",
"25% 19.947192 1.630000 65.473343 2.000000 2.658738 \n",
"50% 22.777890 1.700499 83.000000 2.385502 3.000000 \n",
"75% 26.000000 1.768464 107.430682 3.000000 3.000000 \n",
"max 61.000000 1.980000 173.000000 3.000000 4.000000 \n",
"\n",
" CH2O FAF TUE \n",
"count 2111.000000 2111.000000 2111.000000 \n",
"mean 2.008011 1.010298 0.657866 \n",
"std 0.612953 0.850592 0.608927 \n",
"min 1.000000 0.000000 0.000000 \n",
"25% 1.584812 0.124505 0.000000 \n",
"50% 2.000000 1.000000 0.625350 \n",
"75% 2.477420 1.666678 1.000000 \n",
"max 3.000000 3.000000 2.000000 "
],
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
AgeHeightWeightFCVCNCPCH2OFAFTUE
count2111.0000002111.0000002111.0000002111.0000002111.0000002111.0000002111.0000002111.000000
mean24.3126001.70167786.5860582.4190432.6856282.0080111.0102980.657866
std6.3459680.09330526.1911720.5339270.7780390.6129530.8505920.608927
min14.0000001.45000039.0000001.0000001.0000001.0000000.0000000.000000
25%19.9471921.63000065.4733432.0000002.6587381.5848120.1245050.000000
50%22.7778901.70049983.0000002.3855023.0000002.0000001.0000000.625350
75%26.0000001.768464107.4306823.0000003.0000002.4774201.6666781.000000
max61.0000001.980000173.0000003.0000004.0000003.0000003.0000002.000000
\n",
"
"
]
},
"metadata": {},
"execution_count": 7
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 8,
"source": [
"df.columns"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['Gender', 'Age', 'Height', 'Weight', 'family_history_with_overweight',\n",
" 'FAVC', 'FCVC', 'NCP', 'CAEC', 'SMOKE', 'CH2O', 'SCC', 'FAF', 'TUE',\n",
" 'CALC', 'MTRANS', 'NObeyesdad'],\n",
" dtype='object')"
]
},
"metad
ata": {},
"execution_count": 8
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 9,
"source": [
"df.columns = ['Gender', 'Age', 'Height', 'Weight', 'Family History with Overweight',\n",
" 'Frequent consumption of high caloric food', 'Frequency of consumption of vegetables', 'Number of main meals', 'Consumption of food between meals', 'Smoke', 'Consumption of water daily', 'Calories consumption monitoring', 'Physical activity frequency', 'Time using technology devices',\n",
" 'Consumption of alcohol', 'Transportation used', 'Obesity']\n",
"\n",
"df\n"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Gender Age Height Weight Family History with Overweight \\\n",
"0 Female 21.000000 1.620000 64.000000 yes \n",
"1 Female 21.000000 1.520000 56.000000 yes \n",
"2 Male 23.000000 1.800000 77.000000 yes \n",
"3 Male 27.000000 1.800000 87.000000 no \n",
"4 Male 22.000000 1.780000 89.800000 no \n",
"... ... ... ... ... ... \n",
"2106 Female 20.976842 1.710730 131.408528 yes \n",
"2107 Female 21.982942 1.748584 133.742943 yes \n",
"2108 Female 22.524036 1.752206 133.689352 yes \n",
"2109 Female 24.361936 1.739450 133.346641 yes \n",
"2110 Female 23.664709 1.738836 133.472641 yes \n",
"\n",
" Frequent consumption of high caloric food \\\n",
"0 no \n",
"1 no \n",
"2 no \n",
"3 no \n",
"4 no \n",
"... ... \n",
"2106 yes \n",
"2107 yes \n",
"2108 yes \n",
"2109 yes \n",
"2110 yes \n",
"\n",
" Frequency of consumption of vegetables Number of main meals \\\n",
"0 2.0 3.0 \n",
"1 3.0 3.0 \n",
"2 2.0 3.0 \n",
"3 3.0 3.0 \n",
"4 2.0 1.0 \n",
"... ... ... \n",
"2106 3.0 3.0 \n",
"2107 3.0 3.0 \n",
"2108 3.0 3.0 \n",
"2109 3.0 3.0 \n",
"2110 3.0 3.0 \n",
"\n",
" Consumption of food between meals Smoke Consumption of water daily \\\n",
"0 Sometimes no 2.000000 \n",
"1 Sometimes yes 3.000000 \n",
"2 Sometimes no 2.000000 \n",
"3 Sometimes no 2.000000 \n",
"4 Sometimes no 2.000000 \n",
"... ... ... ... \n",
"2106 Sometimes no 1.728139 \n",
"2107 Sometimes no 2.005130 \n",
"2108 Sometimes no 2.054193 \n",
"2109 Sometimes no 2.852339 \n",
"2110 Sometimes no 2.863513 \n",
"\n",
" Calories consumption monitoring Physical activity frequency \\\n",
"0 no 0.000000 \n",
"1 yes 3.000000 \n",
"2 no 2.000000 \n",
"3 no 2.000000 \n",
"4 no 0.000000 \n",
"... ... ... \n",
"2106 no 1.676269 \n",
"2107 no 1.341390 \n",
"2108 no 1.414209 \n",
"2109 no 1.139107 \n",
"2110 no 1.026452 \n",
"\n",
" Time using technology devices Consumption of alcohol \\\n",
"0 1.000000 no \n",
"1 0.000000 Sometimes \n",
"2 1.000000 Frequently \n",
"3 0.000000 Frequently \n",
"4 0.000000 Sometimes \n",
"... ... ... \n",
"2106 0.906247 Sometimes \n",
"2107 0.599270 Sometimes \n",
"2108 0.646288 Sometimes \n",
"2109 0.586035 Sometimes \n",
"2110 0.714137 Sometimes \n",
"\n",
" Transportation used Obesity \n",
"0 Public_Transportation Normal_Weight \n",
"1 Public_Transportation Normal_Weight \n",
"2 Public_Transportation Normal_Weight \n",
"3 Walking Overweight_Level_I \n",
"4 Public_Transportation Overweight_Level_II \n",
"... ... ... \n",
"2106 Public_Transportation Obesity_Type_III \n",
"2107 Public_Transportation Obesity_Type_III \n",
"2108 Public_Transportation Obesity_Type_III \n",
"2109 Public_Transportation Obesity_Type_III \n",
"2110 Public_Transportation Obesity_Type_III \n",
"\n",
"[2111 rows x 17 columns]"
],
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
GenderAgeHeightWeightFamily History with OverweightFrequent consumption of high caloric foodFrequency of consumption of vegetablesNumber of main mealsConsumption of food between mealsSmokeConsumption of water dailyCalories consumption monitoringPhysical activity frequencyTime using technology devicesConsumption of alcoholTransportation usedObesity
0Female21.0000001.62000064.000000yesno2.03.0Sometimesno2.000000no0.0000001.000000noPublic_TransportationNormal_Weight
1Female21.0000001.52000056.000000yesno3.03.0Sometimesyes3.000000yes3.0000000.000000SometimesPublic_TransportationNormal_Weight
2Male23.0000001.80000077.000000yesno2.03.0Sometimesno2.000000no2.0000001.000000FrequentlyPublic_TransportationNormal_Weight
3Male27.0000001.80000087.000000nono3.03.0Sometimesno2.000000no2.0000000.000000FrequentlyWalkingOverweight_Level_I
4Male22.0000001.78000089.800000nono2.01.0Sometimesno2.000000no0.0000000.000000SometimesPublic_TransportationOverweight_Level_II
......................................................
2106Female20.9768421.710730131.408528yesyes3.03.0Sometimesno1.728139no1.6762690.906247SometimesPublic_TransportationObesity_Type_III
2107Female21.9829421.748584133.742943yesyes3.03.0Sometimesno2.005130no1.3413900.599270SometimesPublic_TransportationObesity_Type_III
2108Female22.5240361.752206133.689352yesyes3.03.0Sometimesno2.054193no1.4142090.646288SometimesPublic_TransportationObesity_Type_III
2109Female24.3619361.739450133.346641yesyes3.03.0Sometimesno2.852339no1.1391070.586035SometimesPublic_TransportationObesity_Type_III
2110Female23.6647091.738836133.472641yesyes3.03.0Sometimesno2.863513no1.0264520.714137SometimesPublic_TransportationObesity_Type_III
\n",
"

2111 rows × 17 columns

\n",
"
"
]
},
"metadata": {},
"execution_count": 9
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 10,
"source": [
"df['Obesity'] = df['Obesity'].apply(lambda x: x.replace('_', ' '))\n",
"df['Transportation used'] = df['Transportation used'].apply(lambda x: x.replace('_', ' '))\n",
"df['Height'] = df['Height']*100\n",
"df['Height'] = df['Height'].round(1)\n",
"df['Weight'] = df['Weight'].round(1)\n",
"df['Age'] = df['Age'].round(1)\n",
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Gender Age Height Weight Family History with Overweight \\\n",
"0 Female 21.0 162.0 64.0 yes \n",
"1 Female 21.0 152.0 56.0 yes \n",
"2 Male 23.0 180.0 77.0 yes \n",
"3 Male 27.0 180.0 87.0 no \n",
"4 Male 22.0 178.0 89.8 no \n",
"... ... ... ... ... ... \n",
"2106 Female 21.0 171.1 131.4 yes \n",
"2107 Female 22.0 174.9 133.7 yes \n",
"2108 Female 22.5 175.2 133.7 yes \n",
"2109 Female 24.4 173.9 133.3 yes \n",
"2110 Female 23.7 173.9 133.5 yes \n",
"\n",
" Frequent consumption of high caloric food \\\n",
"0 no \n",
"1 no \n",
"2 no \n",
"3 no \n",
"4 no \n",
"... ... \n",
"2106 yes \n",
"2107 yes \n",
"2108 yes \n",
"2109 yes \n",
"2110 yes \n",
"\n",
" Frequency of consumption of vegetables Number of main meals \\\n",
"0 2.0 3.0 \n",
"1 3.0 3.0 \n",
"2 2.0 3.0 \n",
"3 3.0 3.0 \n",
"4 2.0 1.0 \n",
"... ... ... \n",
"2106 3.0 3.0 \n",
"2107 3.0 3.0 \n",
"2108 3.0 3.0 \n",
"2109 3.0 3.0 \n",
"2110 3.0 3.0 \n",
"\n",
" Consumption of food between meals Smoke Consumption of water daily \\\n",
"0 Sometimes no 2.000000 \n",
"1 Sometimes yes 3.000000 \n",
"2 Sometimes no 2.000000 \n",
"3 Sometimes no 2.000000 \n",
"4 Sometimes no 2.000000 \n",
"... ... ... ... \n",
"2106 Sometimes no 1.728139 \n",
"2107 Sometimes no 2.005130 \n",
"2108 Sometimes no 2.054193 \n",
"2109 Sometimes no 2.852339 \n",
"2110 Sometimes no 2.863513 \n",
"\n",
" Calories consumption monitoring Physical activity frequency \\\n",
"0 no 0.000000 \n",
"1 yes 3.000000 \n",
"2 no 2.000000 \n",
"3 no 2.000000 \n",
"4 no 0.000000 \n",
"... ... ... \n",
"2106 no 1.676269 \n",
"2107 no 1.341390 \n",
"2108 no 1.414209 \n",
"2109 no 1.139107 \n",
"2110 no 1.026452 \n",
"\n",
" Time using technology devices Consumption of alcohol \\\n",
"0 1.000000 no \n",
"1 0.000000 Sometimes \n",
"2 1.000000 Frequently \n",
"3 0.000000 Frequently \n",
"4 0.000000 Sometimes \n",
"... ... ... \n",
"2106 0.906247 Sometimes \n",
"2107 0.599270 Sometimes \n",
"2108 0.646288 Sometimes \n",
"2109 0.586035 Sometimes \n",
"2110 0.714137 Sometimes \n",
"\n",
" Transportation used Obesity \n",
"0 Public Transportation Normal Weight \n",
"1 Public Transportation Normal Weight \n",
"2 Public Transportation Normal Weight \n",
"3 Walking Overweight Level I \n",
"4 Public Transportation Overweight Level II \n",
"... ... ... \n",
"2106 Public Transportation Obesity Type III \n",
"2107 Public Transportation Obesity Type III \n",
"2108 Public Transportation Obesity Type III \n",
"2109 Public Transportation Obesity Type III \n",
"2110 Public Transportation Obesity Type III \n",
"\n",
"[2111 rows x 17 columns]"
],
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
GenderAgeHeightWeightFamily History with OverweightFrequent consumption of high caloric foodFrequency of consumption of vegetablesNumber of main mealsConsumption of food between mealsSmokeConsumption of water dailyCalories consumption monitoringPhysical activity frequencyTime using technology devicesConsumption of alcoholTransportation usedObesity
0Female21.0162.064.0yesno2.03.0Sometimesno2.000000no0.0000001.000000noPublic TransportationNormal Weight
1Female21.0152.056.0yesno3.03.0Sometimesyes3.000000yes3.0000000.000000SometimesPublic TransportationNormal Weight
2Male23.0180.077.0yesno2.03.0Sometimesno2.000000no2.0000001.000000FrequentlyPublic TransportationNormal Weight
3Male27.0180.087.0nono3.03.0Sometimesno2.000000no2.0000000.000000FrequentlyWalkingOverweight Level I
4Male22.0178.089.8nono2.01.0Sometimesno2.000000no0.0000000.000000SometimesPublic TransportationOverweight Level II
......................................................
2106Female21.0171.1131.4yesyes3.03.0Sometimesno1.728139no1.6762690.906247SometimesPublic TransportationObesity Type III
2107Female22.0174.9133.7yesyes3.03.0Sometimesno2.005130no1.3413900.599270SometimesPublic TransportationObesity Type III
2108Female22.5175.2133.7yesyes3.03.0Sometimesno2.054193no1.4142090.646288SometimesPublic TransportationObesity Type III
2109Female24.4173.9133.3yesyes3.03.0Sometimesno2.852339no1.1391070.586035SometimesPublic TransportationObesity Type III
2110Female23.7173.9133.5yesyes3.03.0Sometimesno2.863513no1.0264520.714137SometimesPublic TransportationObesity Type III
\n",
"

2111 rows × 17 columns

\n",
"
"
]
},
"metadata": {},
"execution_count": 10
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 11,
"source": [
"for x in ['Frequency of consumption of vegetables', 'Number of main meals', 'Consumption of water daily', 'Physical activity frequency', 'Time using technology devices']:\n",
" value = np.array(df[x])\n",
" print(x,':', 'min:', np.min(value), 'max:', np.max(value))\n"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Frequency of consumption of vegetables : min: 1.0 max: 3.0\n",
"Number of main meals : min: 1.0 max: 4.0\n",
"Consumption of water daily : min: 1.0 max: 3.0\n",
"Physical activity frequency : min: 0.0 max: 3.0\n",
"Time using technology devices : min: 0.0 max: 2.0\n"
]
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Exploratory Data Analysis"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 12,
"source": [
"for x in ['Frequency of consumption of vegetables', 'Number of main meals', 'Consumption of water daily', 'Physical activity frequency', 'Time using technology devices']:\n",
" df[x] = df[x].apply(round)\n",
" value = np.array(df[x])\n",
" print(x,':', 'min:', np.min(value), 'max:', np.max(value), df[x].dtype)\n",
" print(df[x].unique())\n",
" "
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Frequency of consumption of vegetables : min: 1 max: 3 int64\n",
"[2 3 1]\n",
"Number of main meals : min: 1 max: 4 int64\n",
"[3 1 4 2]\n",
"Consumption of water daily : min: 1 max: 3 int64\n",
"[2 3 1]\n",
"Physical activity frequency : min: 0 max: 3 int64\n",
"[0 3 2 1]\n",
"Time using technology devices : min: 0 max: 2 int64\n",
"[1 0 2]\n"
]
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 13,
"source": [
"df1 = df.copy()"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 14,
"source": [
"mapping0 = {1:'Never', 2:'Sometimes', 3:'Always'}\n",
"mapping1 = {1: '1', 2:'2' , 3: '3', 4: '3+'}\n",
"mapping2 = {1: 'Less than a liter', 2:'Between 1 and 2 L', 3:'More than 2 L'}\n",
"mapping3 = {0: 'I do not have', 1: '1 or 2 days', 2: '2 or 4 days', 3: '4 or 5 days'}\n",
"mapping4 = {0: '0–2 hours', 1: '3–5 hours', 2: 'More than 5 hours'}"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 15,
"source": [
"df['Frequency of consumption of vegetables'] = df['Frequency of consumption of vegetables'].replace(mapping0)\n",
"df['Number of main meals'] = df['Number of main meals'].replace(mapping1)\n",
"df['Consumption of water daily'] = df['Consumption of water daily'].replace(mapping2)\n",
"df['Physical activity frequency'] = df['Physical activity frequency'].replace(mapping3)\n",
"df['Time using technology devices'] = df['Time using technology devices'].replace(mapping4)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 16,
"source": [
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Gender Age Height Weight Family History with Overweight \\\n",
"0 Female 21.0 162.0 64.0 yes \n",
"1 Female 21.0 152.0 56.0 yes \n",
"2 Male 23.0 180.0 77.0 yes \n",
"3 Male 27.0 180.0 87.0 no \n",
"4 Male 22.0 178.0 89.8 no \n",
"... ... ... ... ... ... \n",
"2106 Female 21.0 171.1 131.4 yes \n",
"2107 Female 22.0 174.9 133.7 yes \n",
"2108 Female 22.5 175.2 133.7 yes \n",
"2109 Female 24.4 173.9 133.3 yes \n",
"2110 Female 23.7 173.9 133.5 yes \n",
"\n",
" Frequent consumption of high caloric food \\\n",
"0 no \n",
"1 no \n",
"2 no \n",
"3 no \n",
"4 no \n",
"... ... \n",
"2106 yes \n",
"2107 yes \n",
"2108 yes \n",
"2109 yes \n",
"2110 yes \n",
"\n",
" Frequency of consumption of vegetables Number of main meals \\\n",
"0 Sometimes 3 \n",
"1 Always 3 \n",
"2 Sometimes 3 \n",
"3 Always 3 \n",
"4 Sometimes 1 \n",
"... ... ... \n",
"2106 Always 3 \n",
"2107 Always 3 \n",
"2108 Always 3 \n",
"2109 Always 3 \n",
"2110 Always 3 \n",
"\n",
" Consumption of food between meals Smoke Consumption of water daily \\\n",
"0 Sometimes no Between 1 and 2 L \n",
"1 Sometimes yes More than 2 L \n",
"2 Sometimes no Between 1 and 2 L \n",
"3 Sometimes no Between 1 and 2 L \n",
"4 Sometimes no Between 1 and 2 L \n",
"... ... ... ... \n",
"2106 Sometimes no Between 1 and 2 L \n",
"2107 Sometimes no Between 1 and 2 L \n",
"2108 Sometimes no Between 1 and 2 L \n",
"2109 Sometimes no More than 2 L \n",
"2110 Sometimes no More than 2 L \n",
"\n",
" Calories consumption monitoring Physical activity frequency \\\n",
"0 no I do not have \n",
"1 yes 4 or 5 days \n",
"2 no 2 or 4 days \n",
"3 no 2 or 4 days \n",
"4 no I do not have \n",
"... ... ... \n",
"2106 no 2 or 4 days \n",
"2107 no 1 or 2 days \n",
"2108 no 1 or 2 days \n",
"2109 no 1 or 2 days \n",
"2110 no 1 or 2 days \n",
"\n",
" Time using technology devices Consumption of alcohol \\\n",
"0 3–5 hours no \n",
"1 0–2 hours Sometimes \n",
"2 3–5 hours Frequently \n",
"3 0–2 hours Frequently \n",
"4 0–2 hours Sometimes \n",
"... ... ... \n",
"2106 3–5 hours Sometimes \n",
"2107 3–5 hours Sometimes \n",
"2108 3–5 hours Sometimes \n",
"2109 3–5 hours Sometimes \n",
"2110 3–5 hours Sometimes \n",
"\n",
" Transportation used Obesity \n",
"0 Public Transportation Normal Weight \n",
"1 Public Transportation Normal Weight \n",
"2 Public Transportation Normal Weight \n",
"3 Walking Overweight Level I \n",
"4 Public Transportation Overweight Level II \n",
"... ... ... \n",
"2106 Public Transportation Obesity Type III \n",
"2107 Public Transportation Obesity Type III \n",
"2108 Public Transportation Obesity Type III \n",
"2109 Public Transportation Obesity Type III \n",
"2110 Public Transportation Obesity Type III \n",
"\n",
"[2111 rows x 17 columns]"
],
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
GenderAgeHeightWeightFamily History with OverweightFrequent consumption of high caloric foodFrequency of consumption of vegetablesNumber of main mealsConsumption of food between mealsSmokeConsumption of water dailyCalories consumption monitoringPhysical activity frequencyTime using technology devicesConsumption of alcoholTransportation usedObesity
0Female21.0162.064.0yesnoSometimes3SometimesnoBetween 1 and 2 LnoI do not have3–5 hoursnoPublic TransportationNormal Weight
1Female21.0152.056.0yesnoAlways3SometimesyesMore than 2 Lyes4 or 5 days0–2 hoursSometimesPublic TransportationNormal Weight
2Male23.0180.077.0yesnoSometimes3SometimesnoBetween 1 and 2 Lno2 or 4 days3–5 hoursFrequentlyPublic TransportationNormal Weight
3Male27.0180.087.0nonoAlways3SometimesnoBetween 1 and 2 Lno2 or 4 days0–2 hoursFrequentlyWalkingOverweight Level I
4Male22.0178.089.8nonoSometimes1SometimesnoBetween 1 and 2 LnoI do not have0–2 hoursSometimesPublic TransportationOverweight Level II
......................................................
2106Female21.0171.1131.4yesyesAlways3SometimesnoBetween 1 and 2 Lno2 or 4 days3–5 hoursSometimesPublic TransportationObesity Type III
2107Female22.0174.9133.7yesyesAlways3SometimesnoBetween 1 and 2 Lno1 or 2 days3–5 hoursSometimesPublic TransportationObesity Type III
2108Female22.5175.2133.7yesyesAlways3SometimesnoBetween 1 and 2 Lno1 or 2 days3–5 hoursSometimesPublic TransportationObesity Type III
2109Female24.4173.9133.3yesyesAlways3SometimesnoMore than 2 Lno1 or 2 days3–5 hoursSometimesPublic TransportationObesity Type III
2110Female23.7173.9133.5yesyesAlways3SometimesnoMore than 2 Lno1 or 2 days3–5 hoursSometimesPublic TransportationObesity Type III
\n",
"

2111 rows × 17 columns

\n",
"
"
]
},
"metadata": {},
"execution_count": 16
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"### Age, Height and Weight"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"In terms of height, male and female are similarly distributed according to the box plot below. While male are generally taller than female, both male and female share a similar average in weight, with female having a much larger range of weight (as well as BMI) compared to male. This is further illustrated by the steeper line plot between weight and height of female than male."
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 18,
"source": [
"sns.set()\n",
"fig = plt.figure(figsize=(20,10))\n",
"plt.subplot(1, 2, 1)\n",
"sns.boxplot(x='Gender', y='Height', data=df)\n",
"plt.subplot(1, 2, 2)\n",
"sns.boxplot(x='Gender', y='Weight', data=df)"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 18
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"
"
],
"image/png": ""
},
"metadata": {}
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 760,
"source": [
"sns.set()\n",
"g = sns.jointplot(\"Height\", \"Weight\", data=df,\n",
" kind=\"reg\", truncate=False,\n",
" xlim=(125, 200), ylim=(35, 180),\n",
" color=\"m\", height=10)\n",
"g.set_axis_labels(\"Height (cm)\", \"Weight (kg)\")"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 760
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"
"
],
"image/png": ""
},
"metadata": {}
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 761,
"source": [
"g = sns.lmplot(x=\"Height\", y=\"Weight\", hue=\"Gender\",\n",
" height=10, data=df)\n",
"g.set_axis_labels(\"Height (cm)\", \"Weight (kg)\")"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 761
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"
"
],
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here