the assignment contains python and rstudio.
--- title: "Assignment Six" author: "Professor Lepore" date: "11/06/2022" output: html_document --- # *Task(s):* - ***[1]*** Clean the data frames [2 Points] - ***[2]*** Merge the data frames [3 Points] - ***[3]*** Run uni variate and bivariate statistics [5 Points] - ***[4]*** When turning in your assignment please attach both this RMD file and the html knitted file. I've already set the specific chunk settings for this assignment. Also be sure to rename both as assignment_six_first_lastname. ### Extra points (3) - (2): Run bivariate statistics on both residential_commercial_ind and eviction_possession using the merged data set - (1): Describe the logic behind your mutates and merging Partial credit points are possible with tasks two and three. # R [PART ONE] ```{r setup, include = TRUE} # Chunk Options ---------------------------------------------------------------- knitr::opts_chunk$set( echo = TRUE ) # R PACKAGES ------------------------------------------------------------------- if (!require('tidyverse')) install.packages('tidyverse', repos = "http://cran.us.r-project.org"); library('tidyverse') if (!require('reticulate')) install.packages('reticulate', repos = "http://cran.us.r-project.org"); library('reticulate') ``` ```{r data_import} homebase_data <- jsonlite::fromjson("https://data.cityofnewyork.us/resource/ntcm-2w4k.json?$limit="200")" eviction_data="">-><- jsonlite::fromjson("https://data.cityofnewyork.us/resource/6z8x-wfk4.json?$limit="100000")" ```="" ```{r="" data_inspection}="" head(homebase_data)="" tail(homebase_data)="" summary(homebase_data)="" colnames(homebase_data)="" head(eviction_data)="" tail(eviction_data)="" summary(eviction_data)="" colnames(eviction_data)="" ```="" ```{r="" data_cleaning}="" #="" create="" the="" object="" homebase_data_clean="" and="" make="" sure="" to:="" #="" 1.="" select="" the="" following="" columns/variables:="" #="" (homebase_office,="" service_area_zip_code,="" postcode,="" borough)="" #="" 2.="" rename:="" service_area_zip_code="" to="" servicing_zipcodes="" and="" #="" postcode="" to="" homebase_location_zipcode="" #="" 4.="" homebase_office="" capitalized="" #="" 5.="" remove="" the="" i,="" ii,="" and="" iii="" from="" the="" homebase="" office="" names="" #="" 6.="" replace="" the="" servicing_zipcodes="" with="" any="" corrections="" #="" 7.="" find="" the="" number="" of="" servicing_zipcodes="" #="" create="" the="" object="" eviction_data_clean="" and="" make="" sure="" to:="" #="" 1.="" select="" the="" following="" columns/variables:="" #="" (residential_commercial_ind,="" eviction_possession,="" borough,="" eviction_zip)="" #="" 2.="" rename:="" residential_commercial_ind="" to="" building_type,="" #="" eviction_possession="" to="" warrant_execution_type,="" and="" eviction_zip="" to="" eviction_zipcode="" #="" 2.="" replace="" unspecified="" with="" na="" in="" warrant_execution_type="" #="" 3.="" remove="" all="" na="" from="" data="" set="" homebase_data_clean="">-><- homebase_data="" eviction_data_clean="">-><- eviction_data_clean="" ```="" ```{r="" merging}="" #="" merge="" the="" two="" data="" sets="" based="" on="" the="" same="" logic="" of="" zipcode="" matching="" that="" we="" did="" in="" class="" combo_homebase_evicition_dataset="">-><- ``` ```{r stats} # run statistics to get the mean, sd, median, iqr, min and max for evictions each homebase location. interpret these numbers (checking for skew as well) # run statistics to get the mean, sd, median, iqr, min and max for evictions each borough using the homebase and borough variables. interpret these numbers (checking for skew as well) ``` # analysis of evictions for each homebase: # analysis of evictions for each borough: # python [part two] ```{python setup_python} # python packages -------------------------------------------------------------- import json import requests import pandas as pd pd.options.mode.chained_assignment = none # default='warn' ``` ```{python data_import_python} json_link = requests.get('https://data.cityofnewyork.us/resource/ntcm-2w4k.json?$limit=200') json_loaded = json.loads(json_link.text) homebase_data_python = pd.dataframe(json_loaded) del(json_link) del(json_loaded) json_link = requests.get('https://data.cityofnewyork.us/resource/6z8x-wfk4.json?$limit=100000') json_loaded = json.loads(json_link.text) eviction_data_python = pd.dataframe(json_loaded) del(json_link) del(json_loaded) ``` ```{python data_inspection_python} homebase_data_python.head(5) homebase_data_python.tail(5) homebase_data_python.dtypes list(homebase_data_python.columns) eviction_data_python.head(5) eviction_data_python.tail(5) eviction_data_python.dtypes list(eviction_data_python.columns) ``` ```{python data_cleaning_python} # create the object homebase_data_clean and make sure to: # 1. select the following columns/variables: # (homebase_office, service_area_zip_code, postcode, borough) # 2. rename: service_area_zip_code to servicing_zipcodes and # postcode to homebase_location_zipcode # 4. homebase_office capitalized # 5. remove the i, ii, and iii from the homebase office names # 6. replace the servicing_zipcodes with any corrections # 7. find the number of servicing_zipcodes # create the object eviction_data_clean and make sure to: # 1. select the following columns/variables: # (residential_commercial_ind, eviction_possession, borough, eviction_zip) # 2. rename: residential_commercial_ind to building_type, # eviction_possession to warrant_execution_type, and eviction_zip to eviction_zipcode # 2. replace unspecified with na in warrant_execution_type # 3. remove all na from data set ``` ```{python stats} # run statistics to get the mean, sd, median, iqr, min and max for evictions each homebase location. # run statistics to get the mean, sd, median, iqr, min and max for evictions each borough using the homebase and borough variables. ``` ```="" ```{r="" stats}="" #="" run="" statistics="" to="" get="" the="" mean,="" sd,="" median,="" iqr,="" min="" and="" max="" for="" evictions="" each="" homebase="" location.="" interpret="" these="" numbers="" (checking="" for="" skew="" as="" well)="" #="" run="" statistics="" to="" get="" the="" mean,="" sd,="" median,="" iqr,="" min="" and="" max="" for="" evictions="" each="" borough="" using="" the="" homebase="" and="" borough="" variables.="" interpret="" these="" numbers="" (checking="" for="" skew="" as="" well)="" ```="" #="" analysis="" of="" evictions="" for="" each="" homebase:="" #="" analysis="" of="" evictions="" for="" each="" borough:="" #="" python="" [part="" two]="" ```{python="" setup_python}="" #="" python="" packages="" --------------------------------------------------------------="" import="" json="" import="" requests="" import="" pandas="" as="" pd="" pd.options.mode.chained_assignment="None" #="" default='warn' ```="" ```{python="" data_import_python}="" json_link="requests.get('https://data.cityofnewyork.us/resource/ntcm-2w4k.json?$limit=200')" json_loaded="json.loads(json_link.text)" homebase_data_python="pd.DataFrame(json_loaded)" del(json_link)="" del(json_loaded)="" json_link="requests.get('https://data.cityofnewyork.us/resource/6z8x-wfk4.json?$limit=100000')" json_loaded="json.loads(json_link.text)" eviction_data_python="pd.DataFrame(json_loaded)" del(json_link)="" del(json_loaded)="" ```="" ```{python="" data_inspection_python}="" homebase_data_python.head(5)="" homebase_data_python.tail(5)="" homebase_data_python.dtypes="" list(homebase_data_python.columns)="" eviction_data_python.head(5)="" eviction_data_python.tail(5)="" eviction_data_python.dtypes="" list(eviction_data_python.columns)="" ```="" ```{python="" data_cleaning_python}="" #="" create="" the="" object="" homebase_data_clean="" and="" make="" sure="" to:="" #="" 1.="" select="" the="" following="" columns/variables:="" #="" (homebase_office,="" service_area_zip_code,="" postcode,="" borough)="" #="" 2.="" rename:="" service_area_zip_code="" to="" servicing_zipcodes="" and="" #="" postcode="" to="" homebase_location_zipcode="" #="" 4.="" homebase_office="" capitalized="" #="" 5.="" remove="" the="" i,="" ii,="" and="" iii="" from="" the="" homebase="" office="" names="" #="" 6.="" replace="" the="" servicing_zipcodes="" with="" any="" corrections="" #="" 7.="" find="" the="" number="" of="" servicing_zipcodes="" #="" create="" the="" object="" eviction_data_clean="" and="" make="" sure="" to:="" #="" 1.="" select="" the="" following="" columns/variables:="" #="" (residential_commercial_ind,="" eviction_possession,="" borough,="" eviction_zip)="" #="" 2.="" rename:="" residential_commercial_ind="" to="" building_type,="" #="" eviction_possession="" to="" warrant_execution_type,="" and="" eviction_zip="" to="" eviction_zipcode="" #="" 2.="" replace="" unspecified="" with="" na="" in="" warrant_execution_type="" #="" 3.="" remove="" all="" na="" from="" data="" set="" ```="" ```{python="" stats}="" #="" run="" statistics="" to="" get="" the="" mean,="" sd,="" median,="" iqr,="" min="" and="" max="" for="" evictions="" each="" homebase="" location.="" #="" run="" statistics="" to="" get="" the="" mean,="" sd,="" median,="" iqr,="" min="" and="" max="" for="" evictions="" each="" borough="" using="" the="" homebase="" and="" borough="" variables.="">- ``` ```{r stats} # run statistics to get the mean, sd, median, iqr, min and max for evictions each homebase location. interpret these numbers (checking for skew as well) # run statistics to get the mean, sd, median, iqr, min and max for evictions each borough using the homebase and borough variables. interpret these numbers (checking for skew as well) ``` # analysis of evictions for each homebase: # analysis of evictions for each borough: # python [part two] ```{python setup_python} # python packages -------------------------------------------------------------- import json import requests import pandas as pd pd.options.mode.chained_assignment = none # default='warn' ``` ```{python data_import_python} json_link = requests.get('https://data.cityofnewyork.us/resource/ntcm-2w4k.json?$limit=200') json_loaded = json.loads(json_link.text) homebase_data_python = pd.dataframe(json_loaded) del(json_link) del(json_loaded) json_link = requests.get('https://data.cityofnewyork.us/resource/6z8x-wfk4.json?$limit=100000') json_loaded = json.loads(json_link.text) eviction_data_python = pd.dataframe(json_loaded) del(json_link) del(json_loaded) ``` ```{python data_inspection_python} homebase_data_python.head(5) homebase_data_python.tail(5) homebase_data_python.dtypes list(homebase_data_python.columns) eviction_data_python.head(5) eviction_data_python.tail(5) eviction_data_python.dtypes list(eviction_data_python.columns) ``` ```{python data_cleaning_python} # create the object homebase_data_clean and make sure to: # 1. select the following columns/variables: # (homebase_office, service_area_zip_code, postcode, borough) # 2. rename: service_area_zip_code to servicing_zipcodes and # postcode to homebase_location_zipcode # 4. homebase_office capitalized # 5. remove the i, ii, and iii from the homebase office names # 6. replace the servicing_zipcodes with any corrections # 7. find the number of servicing_zipcodes # create the object eviction_data_clean and make sure to: # 1. select the following columns/variables: # (residential_commercial_ind, eviction_possession, borough, eviction_zip) # 2. rename: residential_commercial_ind to building_type, # eviction_possession to warrant_execution_type, and eviction_zip to eviction_zipcode # 2. replace unspecified with na in warrant_execution_type # 3. remove all na from data set ``` ```{python stats} # run statistics to get the mean, sd, median, iqr, min and max for evictions each homebase location. # run statistics to get the mean, sd, median, iqr, min and max for evictions each borough using the homebase and borough variables. ```>