Brooklyn Housing Analysis Dataset:CSV here Provide a short narrative describing on the Brooklyn Housing Analysis problem. Find or create appropriate data that can be analyzed, which is in the link...

1 answer below »


Brooklyn Housing Analysis



Dataset:CSV here





Provide a short narrative describing on the Brooklyn Housing Analysis problem. Find or create appropriate data that can be analyzed, which is in the link above.



Write the step-by-step instructions for completing the Graph Analysis and provide detailed findings along the way of the analysis.



1.Load the data from the “train.csv” file into a DataFrame.


2.Display the dimensions of the file (so you’ll have a good idea the amount of data you are working with.


3.Display the first 5 rows of data so you can see the column headings and the type of data for each column.


a.Notice that Survived is represented as a 1 or 0


b.Notice that missing data is represented as “NaN”


c.The Survived variable will be the “target” and the other variables will be the “features”


4.Think about some questions that might help you predict who will survive:


a.What do the variables look like?For example, are they numerical or categorical data. If they are numerical, what are their distribution; if they are categorical, how many are they in different categories?


b.Are the numerical variables correlated?


c.Are the distributions of numerical variables the same or different among grouped neighborhoods?Was there specific year or pattern in years displaying percentage of sales increasing and for what price range?


5.Look at summary information about your data (total, mean, min, max, freq., unique, etc.)Does this present any more questions for you?Does it lead you to a conclusion yet?


6.Make some histograms of your data (“A picture is worth a thousand words!”)


7.Make some bar charts for variables with only a few options.


a.stacked bar visualization of sale prices in pricing ranges by year


8.To see if the data is correlated, make some Pearson Ranking charts


a.Notice that in the sample code, I have saved this png file.


b.The correlation between the variables is low (1 or -1 is high positive or high negative, 0 is low or no correlation). These results show there is “some” positive correlation but it’s not a high correlation.


9.Use Parallel Coordinates visualization tocompare the distributions of numerical variables between sales and increasing cost over the years



Format:The completed task must bein Jupyter Notebook with displayed results.

Answered Same DayOct 05, 2021

Answer To: Brooklyn Housing Analysis Dataset:CSV here Provide a short narrative describing on the Brooklyn...

Ximi answered on Oct 07 2021
147 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import string\n",
"import re\n",
"import matplotlib.pyplot as plt\n",
"from collections import Counter"
]
},
{
"
cell_type": "markdown",
"metadata": {},
"source": [
"# Graph Analysis:\n",
"### Write the step-by-step instructions for completing the Graph Analysis and provide findings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 1: Load data into a dataframe"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"housing_data = pd.read_csv('brooklynhomes2003to2017/brooklyn_sales_map.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 2: Check the dimension of the table and view the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"The dimension of the table is: \", housing_data.shape)\n",
"housing_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 3: What type of variables are in the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Describe Data\")\n",
"print(housing_data.describe())\n",
"print(\"Summarized Data\")\n",
"print(housing_data.describe(include=['O']))\n",
"\n",
"# this will return the datatypes of the columns\n",
"housing_data.dtypes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(len(housing_data))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"housing_data.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 4: Create scatter plot, histogram and bar chart to display and identify outliers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.scatter(x=housing_data['year_of_sale'],y=housing_data['sale_price'])\n",
"ax =plt.gca()\n",
"ax.get_yaxis().get_major_formatter().set_scientific(False)\n",
"plt.draw()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bins=[-100000000,20000,40000,60000,80000,100000,1000000,10000000,500000000]\n",
"choices =['$0-$200k','$200k-$400k','$400k-$600k','$600k-$800k','$800k-$1mlln','$1mlln-$10mlln','$10mlln-$100mlln','$100mlln-$500mlln']\n",
"housing_data['price_range']=pd.cut(housing_data['sale_price'],bins=bins,labels=choices)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Is or are there outliers present? If so, should they be removed from dataset and why?**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here