Brooklyn Housing Analysis
Dataset:
CSV here
Provide a short narrative describing on the Brooklyn Housing Analysis problem. You can use any methods or tools you think are most appropriate. Write the step-by-step instructions for completing the Dimensionality, Feature Reduction, Model Evaluation and Selection part of your case study.
Add the last remaining steps (10-15) to the current file Jupyter Notebook.
Provide a short narrative describing on the Brooklyn Housing Analysis problem.
1.I want to see if I can create a map to display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.
Dimensionality and Feature Reduction
2.Some of my questions have been answered by seeing the charts but in some ways, looking at this much data has created even more questions.
a.Now it’s time to reduce some of the features so we can concentrate on the things that matter!There features we will get rid of are:"Unnamed", "apartment_number", "Ext ", "Landmark", etc.
b.Fill in missing values.(apartment_number has some missing values but we are dropping that feature.)If there is a missing value in a column representing the year in which alterations where carried out on a property, it may make more sense assuming no alteration had been carried out.
3.If you go back and look at the histograms of sales, you’ll see that it is very skewed…many low real estate sales, not very many high real estate sales.Log Transformation is a good method to use on highly skewed data.
4.Convert your categorical data into numbers.For other categorical columns, I filled the missing data with the modal value of their respective columns and for the rest of the numerical variables I used a mixture of a soft impute imputation and filling missing data using the median value.
Model Evaluation and Selection
5.Training – split the data into two sets: Training and Testing.
6.Evaluation: remember we are trying to predict selling prices of houses
Format:The completed task must bein Jupyter Notebook with run & displayed results.
Resources:
https://www.kaggle.com/tianhwu/brooklynhomes2003to2017
https://hackernoon.com/predicting-the-price-of-houses-in-brooklyn-using-python-1abd7997083b
https://towardsdatascience.com/closing-the-sale-predicting-home-prices-via-linear-regression-2eac62c72818
https://medium.com/geoai/house-hunting-the-data-scientist-way-b32d93f5a42f