Problem 1. Linear Regression: Please answer the following questions using the West Roxbury Neighborhood data (west_roxbury_Modified_A6.csv). This data set contains 15 variables as listed below: TOTAL...

I need help with the R script for this question set. I am having trouble understanding it.


Problem 1. Linear Regression: Please answer the following questions using the West Roxbury Neighborhood data (west_roxbury_Modified_A6.csv). This data set contains 15 variables as listed below: TOTAL VALUE Total assessed value for property, in thousands of USD TAX Tax bill amount based on total assessed value multiplied by the tax rate LOT SQFT Total lot size of parcel in square feet YR BUILT Year property was built GROSS AREA Gross floor area LIVING AREA Total living area for residential properties (ft2) FLOORS Number of floors ROOMS Total number of rooms BEDROOMS Total number of bedrooms FULL BATH Total number of full baths HALF BATH Total number of half baths KITCHEN Total number of kitchens FIREPLACE Total number of fireplaces REMODEL When house was remodeled (Recent/Old/None) Rating The home’s rating which is an integer between 1 and 5 1. [1 point] Please import the data into R, and name the dataframe “home_data”. Convert the columns FULL.BATH and REMODEL in the “home_data” to categorical variables using as.factor(). Please make sure to update the columns in the “home_data”. 2. [0.5 point] Create a new column in the home_data, named “AGE”, to represent the age of the property, computed as the current year, minus the Year the property was built. 3. [1 point] Create a new column in the home_data, named High.Tax. For each row, High.Tax is 1 if the “TAX” is greater than the “TAX” column’s median, and 0 otherwise. 4. [1 point] Randomly partition the data into a training set (60%) and a validation set (40%). Use 100 as the argument of the set.seed() function. 5. [1 point] Build a linear regression model to predict the TAX using LIVING.AREA, AGE, and FULL.BATH as predictor variables. 6. [2 points] Interpret the coefficient estimates for each predictor variable (i.e. how does change in the value of each predictor variable impact the value of the target variable)? Which predictors are significant? 7. [4 points] Obtain the accuracy performance measures for your model applied to both training and validation data. 8. [1 point] Is your model over-fitted? Discuss your answer. Problem 2. Logistic Regression 1. [1.5 point] Use boxplots to investigate the possible relationship between each of the following variable pairs: · High.Tax and the GROSS.AREA · High.Tax and AGE · High.Tax and FLOORS Discuss if you should use the GROSS.AREA, AGE, and FLOORS as predictors for the High.Tax. 2. [1 point] Investigate the possible relationship between High.Tax and REMODEL using the table() function. Discuss if you should include the REMODEL as a predictor for the High.Tax. 3. [1 point] Build a logistic regression model using High.Tax as the target variable and three of the GROSS.AREA, AGE, FLOORS, and REMODEL as predictors, based on your data understanding in parts 1 and 2. 4. [1 point] Use the model you created to predict the probability of High.Tax =1. Then use 0.5 as the cutoff probability to convert the probabilities to binary values. 5. [1 point] Interpret coefficients of the predictor variables in your model. 6. [2 point] Find the overall accuracy of the model. 7. [1 point] Assume that it is more important to detect the homes with a high tax. Therefore, the important class is High.Tax = 1. What is the sensitivity and specificity of your model?
Apr 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here