Answer To: Brooklyn Housing Analysis Dataset:CSV here Provide a short narrative describing on the Brooklyn...
Ximi answered on Oct 07 2021
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import string\n",
"import re\n",
"import matplotlib.pyplot as plt\n",
"from collections import Counter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graph Analysis:\n",
"### Write the step-by-step instructions for completing the Graph Analysis and provide findings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 1: Load data into a dataframe"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"housing_data = pd.read_csv('brooklynhomes2003to2017/brooklyn_sales_map.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 2: Check the dimension of the table and view the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"The dimension of the table is: \", housing_data.shape)\n",
"housing_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 3: What type of variables are in the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Describe Data\")\n",
"print(housing_data.describe())\n",
"print(\"Summarized Data\")\n",
"print(housing_data.describe(include=['O']))\n",
"\n",
"# this will return the datatypes of the columns\n",
"housing_data.dtypes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(len(housing_data))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"housing_data.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 4: Create scatter plot, histogram and bar chart to display and identify outliers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.scatter(x=housing_data['year_of_sale'],y=housing_data['sale_price'])\n",
"ax =plt.gca()\n",
"ax.get_yaxis().get_major_formatter().set_scientific(False)\n",
"plt.draw()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bins=[-100000000,20000,40000,60000,80000,100000,1000000,10000000,500000000]\n",
"choices =['$0-$200k','$200k-$400k','$400k-$600k','$600k-$800k','$800k-$1mlln','$1mlln-$10mlln','$10mlln-$100mlln','$100mlln-$500mlln']\n",
"housing_data['price_range']=pd.cut(housing_data['sale_price'],bins=bins,labels=choices)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Is or are there outliers present? If so, should they be removed from dataset and why?**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
...