Answer To: Structured Data Processing: For the purposes of this write-up, we will use examples from Donors data...
Sudipta answered on Oct 07 2021
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Defining function for reading data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n",
"0 1461 20 RH 80.0 11622 Pave NaN Reg \n",
"1 1462 20 RL 81.0 14267 Pave NaN IR1 \n",
"2 1463 60 RL 74.0 13830 Pave NaN IR1 \n",
"3 1464 60 RL 78.0 9978 Pave NaN IR1 \n",
"4 1465 120 RL 43.0 5005 Pave NaN IR1 \n",
"5 1466 60 RL 75.0 10000 Pave NaN IR1 \n",
"6 1467 20 RL NaN 7980 Pave NaN IR1 \n",
"7 1468 60 RL 63.0 8402 Pave NaN IR1 \n",
"8 1469 20 RL 85.0 10176 Pave NaN Reg \n",
"9 1470 20 RL 70.0 8400 Pave NaN Reg \n",
"\n",
" LandContour Utilities ... ScreenPorch PoolArea PoolQC Fence \\\n",
"0 Lvl AllPub ... 120 0 NaN MnPrv \n",
"1 Lvl AllPub ... 0 0 NaN NaN \n",
"2 Lvl AllPub ... 0 0 NaN MnPrv \n",
"3 Lvl AllPub ... 0 0 NaN NaN \n",
"4 HLS AllPub ... 144 0 NaN NaN \n",
"5 Lvl AllPub ... 0 0 NaN NaN \n",
"6 Lvl AllPub ... 0 0 NaN GdPrv \n",
"7 Lvl AllPub ... 0 0 NaN NaN \n",
"8 Lvl AllPub ... 0 0 NaN NaN \n",
"9 Lvl AllPub ... 0 0 NaN MnPrv \n",
"\n",
" MiscFeature MiscVal MoSold YrSold SaleType SaleCondition \n",
"0 NaN 0 6 2010 WD Normal \n",
"1 Gar2 12500 6 2010 WD Normal \n",
"2 NaN 0 3 2010 WD Normal \n",
"3 NaN 0 6 2010 WD Normal \n",
"4 NaN 0 1 2010 WD Normal \n",
"5 NaN 0 4 2010 WD Normal \n",
"6 Shed 500 3 2010 WD Normal \n",
"7 NaN 0 5 2010 WD Normal \n",
"8 NaN 0 2 2010 WD Normal \n",
"9 NaN 0 4 2010 WD Normal \n",
"\n",
"[10 rows x 80 columns]\n",
"1459\n",
"80\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"#Function defined for reading the data from an excel file.\n",
"def readFile():\n",
" df=pd.read_excel(r'path to data here\\house.xlsx')\n",
" df=pd.DataFrame(df)\n",
" return(df)\n",
"df=readFile()\n",
"index=df.index\n",
"number_of_rows=len(index)\n",
"number_of_columns=df.columns\n",
"#prints first 10 records\n",
"print(df.head(10))\n",
"#prints number of rows\n",
"print(number_of_rows)\n",
"#print number of columns \n",
"print(len(number_of_columns))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"source": [
"# Cleaning data where columns with heading 'PoolQC', '3SsnPorch' and 'Alley' are removed, which are not required or dump value fields."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Id MSSubClass MSZoning LotFrontage LotArea Street LotShape \\\n",
"0 1461 20 RH 80.0 11622 Pave Reg \n",
"1 1462 20 RL 81.0 14267 Pave IR1 \n",
"2 1463 60 RL 74.0 13830 Pave IR1 \n",
"3 1464 60 RL 78.0 9978 Pave IR1 \n",
"4 1465 120 RL 43.0 5005 Pave IR1 \n",
"5 1466 60 RL 75.0 10000 Pave IR1 \n",
"6 1467 20 RL NaN 7980 Pave IR1 \n",
"7 1468 60 RL 63.0 8402 Pave IR1 \n",
"8 1469 20 RL 85.0 10176 Pave Reg \n",
"9 1470 20 RL 70.0 8400 Pave Reg \n",
"\n",
" LandContour Utilities LotConfig ... EnclosedPorch ScreenPorch \\\n",
"0 Lvl AllPub Inside ... 0 120 \n",
"1 Lvl AllPub Corner ... 0 0 \n",
"2 Lvl AllPub Inside ... 0 0 \n",
"3 Lvl AllPub Inside ... 0 0 \n",
"4 HLS AllPub Inside ... 0 144 \n",
"5 Lvl AllPub Corner ... ...