STAT 4385 Applied Regression Analysis Computer Project III (Due on 05/07/2021 Friday before 11:59pm, Upload your report in OneNote ) (Model Diagnostics) The crime data contain the crime rate, and...

1 answer below »
On file


STAT 4385 Applied Regression Analysis Computer Project III (Due on 05/07/2021 Friday before 11:59pm, Upload your report in OneNote ) (Model Diagnostics) The crime data contain the crime rate, and other information for each of fifty (n = 50) cities (Reference: Life In America’s Small Cities, by G.S. Thomas). One objective of the study is to model the crime rate (Y ) based on other predictor variables (X1–X5). A short variable description is given below. Variable Name Meaning Y crimerate total overall reported crime rate per 1 million residents X1 police.funding annual police funding in $/resident X2 high % of people 25 years+ with 4 yrs. of high school X3 middle % of 16 to 19 year-olds not in highschool and not highschool graduates. X4 in.college % of 18 to 24 year-olds in college X5 graduate % of people 25 years+ with at least 4 years of college First download the dataset and read the data in to R. The objective of this project is to find the best linear model and conduct model diagnostics. Make sure that you interpret the results appropriately. 1. Use the all subset(or best subset) selection procedure with AIC to find your best linear regres- sion model. 2. Output the Table of Parameter Estimates and the ANOVA table for your best model. Interpret the R2. 3. Take the following specific steps to check model assumptions. (a) (Check Normality) Obtain the studentized jackknife residuals and check if they follow the standard normal distribution using histogram and Q-Q plot graphically and referring to the Shapiro-Wilk test for normality. (b) (Check Homoscedasticity) Plot Absolute Jackknife Residuals vs. Fitted values using the R function spreadLevelPlot() in Package {car}. Apply the Breusch-Pagan test or bptest() function in the R Package {lmtest} to test for non-constant error variance. (c) (Check Independence) Apply the Durbin-Watson test to check for auto-correlated errors. 1 (d) (Check Linearity) Use partial residual plots (Function crPlots in the car package) to check on linearity or functional form for each continuous predictor that you have included in your best linear model. 4. (Outlier Detection) Identify outliers that are outlying in terms of predictors, response, and being influential by using the leverage hii, the studentized jackknife residuals ri, and Cook’s distance di. Make a bubble plot of these three measures using Function influencePlot in the car package. 5. (Multicollinearity) Assess multicollinearity by obtaining the variance inflation factor (VIF) measures. You don’t have to consider the intercept term. Use 10 for VIF to determine presence of severe multicollinearity. Some helpful tips for computer projects are listed below: ˆ Start early and don’t wait till the last day/minute; ˆ Use copy-and-paste appropriately to include necessary R output into your final report; ˆ Remember to interpret every result and graphs that you present; ˆ Place your R codes in an appendix. 2
Answered 2 days AfterMay 03, 2021

Answer To: STAT 4385 Applied Regression Analysis Computer Project III (Due on 05/07/2021 Friday before 11:59pm,...

Mohd answered on May 06 2021
149 Votes
-
-
-
5/6/2021
loading packages and Data
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(MASS)
library(GGally)
library(car)
library(lmtest)
library(readr)
crime <- rea
d_csv("crime.csv")
Use the all subset(or best subset) selection procedure with AIC to find your best linear regression model.
lm_mod<-lm(crimerate~police.funding+high+middle+in.college+graduate,data=crime)
stepAIC(lm_mod,direction="backward")
## Start: AIC=559.03
## crimerate ~ police.funding + high + middle + in.college + graduate
##
## Df Sum of Sq RSS AIC
## - in.college 1 467 2821727 557.04
## - graduate 1 10255 2831515 557.21
## - middle 1 19053 2840313 557.37
## - high 1 55510 2876769 558.01
## 2821259 559.03
## - police.funding 1 816153 3637413 569.74
##
## Step: AIC=557.04
## crimerate ~ police.funding + high + middle + graduate
##
## Df Sum of Sq RSS AIC
## - middle 1 23017 2844744 555.45
## - graduate 1 25192 2846918 555.49
## - high 1 93491 2915217 556.67
## 2821727 557.04
## - police.funding 1 846407 3668134 568.16
##
## Step: AIC=555.45
## crimerate ~ police.funding + high + graduate
##
## Df Sum of Sq RSS AIC
## - graduate 1 14129 2858872 553.70
## 2844744 555.45
## - high 1 148317 2993061 555.99
## - police.funding 1 1276802 4121546 571.99
##
## Step: AIC=553.7
## crimerate ~ police.funding + high
##
## Df Sum of Sq RSS AIC
## 2858872 553.7
## - high 1 171116 3029988 554.6
## - police.funding 1 1297044 4155917 570.4
##
## Call:
## lm(formula = crimerate ~ police.funding + high, data = crime)
##
## Coefficients:
## (Intercept) police.funding high
## 621.426 11.858 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here