On fileSTAT54385 - Homework 8 Due: May 4th before 11:59pm • 100 points • Submit your solution in...

Question

On fileSTAT54385 - Homework 8 Due: May 4th before 11:59pm • 100 points • Submit your solution in OneNote • You MUST attach all your R codes used to produce outputs. The instructor will test your R code. • Using the R Markdown for this homework is highly encouraged. Answer the following questions: 1. Consider the logistic model for the south African heart disease data. 1. Based on the full model, we want to use the LRT to see whether or not two coefficients for tobacco and alcohol are zero. What is the test statistic, its degree of freedom and p-value? 2. At α = 0.05, what is your conclusion? 2. (Continued) The south African heart disease data. 1. Let p = Pr(chd = 1|X). We consider two models: M1 : logit(p) = β0 + β1ldl + +β2famhist+ β3age+ β4tobacco; M2 : logit(p) = β0 + β1ldl + +β2famhist+ β3age+ β4alcohol. Are they nested models? 2. We want to choose one of these two models. What procedure can we use for the purpose? 3. Perfrom the procedure you described in the last question, and select a model. 1 	STAT54385 - Homework 8 	Due: May 4th before 11:59pm 	Answer the following questions:   Logistic Regression  In this session, we will see how to implement logistic regression using R.  South African Heart Disease  Descriptions of variables in this data set can be found here.  install: knitr and rmarkdown  Load a data set (csv format)  The data set follows the comma separated value file format. We loaded the data, and drop a  few variables that you will not use.  saheart = read.table("SAheart.data",sep=",",head=T,row.names=1)  names(saheart) # variable names in the advertising data  ##  [1] "sbp"       "tobacco"   "ldl"       "adiposity" "famhist"   "typea"      ##  [7] "obesity"   "alcohol"   "age"       "chd"  saheart$chd = as.factor(saheart$chd)  drops = c("typea","adiposity")  saheart = saheart[,!(names(saheart) %in% drops)]  attach(saheart)  We use ggpairs to do some exploratory data analysis (EDA)!  plotsaheart = ggpairs(saheart, mapping = aes(color = chd, alpha = 0.4),  axisLabels="none",                lower = list(                continuous = "smooth",                combo = "facetdensity")) +      theme_bw() +      theme(axis.line = element_line(colour = "black"),      #panel.grid.major = element_blank(),      panel.grid.minor = element_blank(),      #panel.border = element_blank()      panel.background = element_blank()      )   Save a plot to a pdf file.  plotsaheart  http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.info.txt   Alternatively, you can save the plot directly to a pdf file. This is very usefully technique  when generating plots takes quite long or when the machine does not have the GUI.  pdf("plotsaheart.pdf", width=12, height=9)  print(plotsaheart)  dev.off()  Fit Logistic regression  glm() is the one that people use most frequently to fit the logistic regression with binary  outcomes. One needs to specify family = "binomial" in glm() as the observation follows  the binomial distribution. Once we fit the model.  mylogit = glm(chd~.,data = saheart, family = "binomial")  summary(mylogit)  ##   ## Call:  ## glm(formula = chd ~ ., family = "binomial", data = saheart)  ##   ## Deviance Residuals:   ##     Min       1Q   Median       3Q      Max    ## -1.7517  -0.8378  -0.4552   0.9292   2.4434    ##   ## Coefficients:  ##                  Estimate Std. Error z value Pr(>|z|)      ## (Intercept)    -4.1295997  0.9641558  -4.283 1.84e-05 ***  ## sbp             0.0057607  0.0056326   1.023  0.30643      ## tobacco         0.0795256  0.0262150   3.034  0.00242 **   ## ldl             0.1847793  0.0574115   3.219  0.00129 **   ## famhistPresent  0.9391855  0.2248691   4.177 2.96e-05 ***  ## obesity        -0.0345434  0.0291053  -1.187  0.23529      ## alcohol         0.0006065  0.0044550   0.136  0.89171      ## age             0.0425412  0.0101749   4.181 2.90e-05 ***  ## ---  ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  ##   ## (Dispersion parameter for binomial family taken to be 1)  ##   ##     Null deviance: 596.11  on 461  degrees of freedom  ## Residual deviance: 483.17  on 454  degrees of freedom  ## AIC: 499.17  ##   ## Number of Fisher Scoring iterations: 4  likelihood ratio test (LRT)  One can perform the significant test for the coefficients using the LRT.  mylogit1 = update(mylogit,~.-sbp-obesity-alcohol)  anova(mylogit1,mylogit,test="Chisq")  ## Analysis of Deviance Table  ##   ## Model 1: chd ~ tobacco + ldl + famhist + age  ## Model 2: chd ~ sbp + tobacco + ldl + famhist + obesity + alcohol + age  ##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  ## 1       457     485.44                       ## 2       454     483.17  3   2.2698   0.5183  Model selection  The AIC has ? = 2, and the BIC has ? = log(?). We apply the forward stepwise selection  using BIC and the backward stepwise selection using AIC. Final selected models do not  agree, but their AICs and BICs are quite close.  N = log(length(chd)) # number of observations  fmod = glm(chd~.,data = saheart, family = "binomial") # full model  nmod = glm(chd~1,data = saheart, family = "binomial") # null model    # forward stepwise (BIC)  fstep = step(nmod, scope=list(lower=nmod, upper=fmod), direction="forward",  k=log(N), trace=FALSE)  fstep$anova # enumerated varialbes that are added and droped  ##        Step Df  Deviance Resid. Df Resid. Dev      AIC  ## 1           NA        NA       461   596.1084 597.9225  ## 2     + age -1 70.546083       460   525.5623 529.1905  ## 3 + famhist -1 18.904183       459   506.6582 512.1005  ## 4 + tobacco -1 11.272755       458   495.3854 502.6418  ## 5     + ldl -1  9.941538       457   485.4439 494.5144  BIC(fstep)  ## [1] 516.1217  AIC(fstep)  ## [1] 495.4439  # backard stepwise (AIC)  bstep = step(fmod, nmod, direction="backward", k=2, trace=FALSE)  bstep$anova  ##        Step Df   Deviance Resid. Df Resid. Dev      AIC  ## 1           NA         NA       454   483.1740 499.1740  ## 2 - alcohol  1 0.01850382       455   483.1925 497.1925  ## 3     - sbp  1 1.10421166       456   484.2967 496.2967  ## 4 - obesity  1 1.14711316       457   485.4439 495.4439  BIC(bstep)  ## [1] 516.1217  AIC(bstep)  ## [1] 495.4439  We can also consider two-way interaction terms with the full model, and perform the  variable selection. We improve the model fit slightly by including interaction terms.  # backward stepwise + 2 way interactions  f2istep = step(fmod, ~.^2, k=log(N), trace=FALSE)  f2istep$anova  ##                Step Df Deviance Resid. Df Resid. Dev      AIC  ## 1                   NA       NA       454   483.1740 497.6868  ## 2     + ldl:famhist -1 9.324987       453   473.8490 490.1760  ## 3 + obesity:alcohol -1 5.612713       452   468.2363 486.3774  ## 4             - sbp  1 1.300862       453   469.5372 485.8641  BIC(f2istep)  ## [1] 524.7573  AIC(f2istep)  ## [1] 487.5372  	Logistic Regression 	South African Heart Disease 	Load a data set (csv format) 	Save a plot to a pdf file. 	Fit Logistic regression 	likelihood ratio test (LRT) 	Model selection

Subhanbasha · Accepted Answer

---
title: "STAT54385 - Homework 8"
output: word_document
---
## Logistic Regression
# Calling required packages
```{r chunksetup, include=FALSE} 
library(GGally) # to use 'ggpairs'
library(ggplot2) # pretty looking plots
```
#### Loading a data set (csv format)
```{r}
saheart = read.table("SAheart.data",sep=",",head=T,row.names=1)
names(saheart) # variable names in the advertising data
saheart$chd = as.factor(saheart$chd)
drops = c("typea","adiposity")
saheart = saheart[,!(names(saheart) %in% drops)]
attach(saheart)
```
# Making plots by using ggplot
```{r,fig.width = 12, fig.height = 9}
plotsaheart = ggpairs(saheart, mapping = aes(color = chd, alpha = 0.4), axisLabels="none",
              lower = list(
              continuous = "smooth",
              combo = "facetdensity")) +
    theme_bw() +
    theme(axis.line = element_line(colour = "black"),
    #panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    #panel.border = element_blank()
    panel.background = element_blank()
    ) 
```
#### Printing the plot
```{r,message=FALSE}
plotsaheart
```
```{r, eval=FALSE}
pdf("plotsaheart.pdf",

STAT XXXXXXXXXXHomework 8 Due: May 4th before 11:59pm • 100 points • Submit your solution in OneNote • You MUST attach all your R codes used to produce outputs. The instructor will test your R code. •...

Answer To: STAT XXXXXXXXXXHomework 8 Due: May 4th before 11:59pm • 100 points • Submit your solution in OneNote...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment