The file contains questions and you need to use only R studio
STA 542 HW5 (Ch5&6) Question: 1 2 Total Points: 30 15 45 Score: 1. Data file “data/drinking.txt" has the result of cross classifying a sample of people from the MBTI Step II National Sample (collected and compiled by CPP, Inc.) on whether they report drinking alcohol frequently (1 = yes, 0 = no) and on the four binary scales of the Myers–Briggs personality test: Extro- version/Introversion (E/I), Sensing/iNtuitive (S/N), Thinking/Feeling (T/F) and Judging/Perceiving (J/P). You can load the data by R code below: load(file = "data/drinking.Rdata") The 16 predictor combinations correspond to the 16 personality types: ESTJ, ESTP, ESFJ, ESFP, ENTJ, ENTP, ENFJ, ENFP, ISTJ, ISTP, ISFJ, ISFP, INTJ, INTP, INFJ, INFP. (a) (5 points) Fit a model using the four scales as predictors of π = the probability of drinking alcohol frequently. Report the prediction equation, specifying how you set up the indicator variables. (b) (5 points) Find π̂ for someone of personality type ESTJ. (c) (5 points) Based on the model parameter estimates, explain why the personality type with the highest π̂ is ENTP. (d) (5 points) Conduct a model goodness-of-fit test, and interpret. (e) (5 points) If you were to simplify the model by removing a predictor, which would you remove? Why? (f) (5 points) When six interaction terms are added, the deviance decreases to 3.74. Show how to test the hypothesis that none of the interaction terms are needed, and interpret. 2. A model fit predicting preference for President (Democrat, Republican, Independent) using x = annual income ( in $10, 000 dollars ) is log (π̂D/π̂I) = 3.3− 0.2x and log (π̂R/π̂I) = 1.0 + 0.3x (a) (5 points) State the prediction equation for log (π̂R/π̂D). Interpret its slope. (b) (5 points) Find the range of x for which π̂R > π̂D. (c) (5 points) State the prediction equation for π̂I . Moradi ALi Hassan AL-Jadaani "F1" "F2" "F3" "F4" "Yes" "No" "1" "J" "T" "S" "E" 10 67 "2" "P" "T" "S" "E" 8 34 "3" "J" "F" "S" "E" 5 101 "4" "P" "F" "S" "E" 7 72 "5" "J" "T" "N" "E" 3 20 "6" "P" "T" "N" "E" 2 16 "7" "J" "F" "N" "E" 4 27 "8" "P" "F" "N" "E" 15 65 "9" "J" "T" "S" "I" 17 123 "10" "P" "T" "S" "I" 3 49 "11" "J" "F" "S" "I" 6 132 "12" "P" "F" "S" "I" 4 102 "13" "J" "T" "N" "I" 1 12 "14" "P" "T" "N" "I" 5 30 "15" "J" "F" "N" "I" 1 30 "16" "P" "F" "N" "I" 6 73 STA 542 HW5 (Ch5&6) Question: 1 2 Total Points: 30 15 45 Score: 1. Data file “data/drinking.txt" has the result of cross classifying a sample of people from the MBTI Step II National Sample (collected and compiled by CPP, Inc.) on whether they report drinking alcohol frequently (1 = yes, 0 = no) and on the four binary scales of the Myers–Briggs personality test: Extro- version/Introversion (E/I), Sensing/iNtuitive (S/N), Thinking/Feeling (T/F) and Judging/Perceiving (J/P). You can load the data by R code below: load(file = "data/drinking.Rdata") The 16 predictor combinations correspond to the 16 personality types: ESTJ, ESTP, ESFJ, ESFP, ENTJ, ENTP, ENFJ, ENFP, ISTJ, ISTP, ISFJ, ISFP, INTJ, INTP, INFJ, INFP. (a) (5 points) Fit a model using the four scales as predictors of π = the probability of drinking alcohol frequently. Report the prediction equation, specifying how you set up the indicator variables. (b) (5 points) Find π̂ for someone of personality type ESTJ. (c) (5 points) Based on the model parameter estimates, explain why the personality type with the highest π̂ is ENTP. (d) (5 points) Conduct a model goodness-of-fit test, and interpret. (e) (5 points) If you were to simplify the model by removing a predictor, which would you remove? Why? (f) (5 points) When six interaction terms are added, the deviance decreases to 3.74. Show how to test the hypothesis that none of the interaction terms are needed, and interpret. 2. A model fit predicting preference for President (Democrat, Republican, Independent) using x = annual income ( in $10, 000 dollars ) is log (π̂D/π̂I) = 3.3− 0.2x and log (π̂R/π̂I) = 1.0 + 0.3x (a) (5 points) State the prediction equation for log (π̂R/π̂D). Interpret its slope. (b) (5 points) Find the range of x for which π̂R > π̂D. (c) (5 points) State the prediction equation for π̂I . Moradi ALi Hassan AL-Jadaani "F1" "F2" "F3" "F4" "Yes" "No" "1" "J" "T" "S" "E" 10 67 "2" "P" "T" "S" "E" 8 34 "3" "J" "F" "S" "E" 5 101 "4" "P" "F" "S" "E" 7 72 "5" "J" "T" "N" "E" 3 20 "6" "P" "T" "N" "E" 2 16 "7" "J" "F" "N" "E" 4 27 "8" "P" "F" "N" "E" 15 65 "9" "J" "T" "S" "I" 17 123 "10" "P" "T" "S" "I" 3 49 "11" "J" "F" "S" "I" 6 132 "12" "P" "F" "S" "I" 4 102 "13" "J" "T" "N" "I" 1 12 "14" "P" "T" "N" "I" 5 30 "15" "J" "F" "N" "I" 1 30 "16" "P" "F" "N" "I" 6 73 STA 542: Categorical Data Analysis - Chapter 6: Multicategory Logit Models STA 542: Categorical Data Analysis Chapter 6: Multicategory Logit Models Guangyu Zhu University of Rhode Island Guangyu Zhu Chapter 6: Multicategory Logit Models 1 / 60 Logit Models for Nominal Responses Guangyu Zhu Chapter 6: Multicategory Logit Models 2 / 60 Outline 1. Logit Models for Nominal Responses Introduction Example (Income and Job Satisfaction from 1991 GSS) 2. Cumulative Logit Models for Ordinal Responses Guangyu Zhu Chapter 6: Multicategory Logit Models 3 / 60 Introduction When Y is binary, logistic model fit π = Pr(Y = 1) logit(π) = log π 1− π = log Pr(Y = 1) Pr(Y = 0) Multicategory Logit Models • Y has J categories, J > 2. • Assume a multinomial distribution for Y . • Specifying the odds of outcome in one category instead of another. • In R, we will fit these models using the VGAM package. Guangyu Zhu Chapter 6: Multicategory Logit Models 4 / 60 RAD PDF Rectangle RAD PDF Rectangle RAD PDF Rectangle Baseline-Category Logits If Y is nominal (unordered categories): Baseline-category logit model log ( πj πJ ) = αj + βjx , j = 1, 2, . . . , J − 1. • J − 1 equations with separate (αj , βj) The J − 1 equations for these pairs of categories determine equations for all other pairs of categories: log ( πa πb ) = log ( πa/πJ πb/πJ ) = log ( πa πJ ) − log ( πb πJ ) = (αa + βax)− (αb + βbx) = (αa − αb) + (βa − βb) x Guangyu Zhu Chapter 6: Multicategory Logit Models 5 / 60 Logit Models for Nominal Responses eβj is the multiplicative effect of a 1-unit increase in x on the odds of j vs the baseline J. odds(x) = πj πJ = exp(αj + βjx) ⇒ odds(x + 1) odds(x) = eβj In R, use vglm function with multinomial family from VGAM package Guangyu Zhu Chapter 6: Multicategory Logit Models 6 / 60 Outline 1. Logit Models for Nominal Responses Introduction Example (Income and Job Satisfaction from 1991 GSS) 2. Cumulative Logit Models for Ordinal Responses Guangyu Zhu Chapter 6: Multicategory Logit Models 7 / 60 Example (Income and Job Satisfaction from 1991 GSS) Income Job Satisfaction Dissat Little Moderate Very Total <5k 2="" 4="" 13="" 3="" 22="" 5k-15k="" 2="" 6="" 22="" 4="" 34="" 15k-25k="" 0="" 1="" 15="" 8="" 24="">25K 0 3 13 8 24 Total 4 14 63 23 104 Using x = income scores (3, 10, 20, 35), we fit the model log ( πj π4 ) = αj + βjx , j = 1, 2, 3. for J = 4 job satisfaction categories. Guangyu Zhu Chapter 6: Multicategory Logit Models 8 / 60 RAD PDF Rectangle RAD PDF Rectangle Data Exploration Import data load("data/jobsatisfaction.RData") jobsatisfaction ## Gender Income JobSat Freq ## 1 F 3 Diss 1 ## 2 F 10 Diss 2 ## 3 F 20 Diss 0 ## 4 F 35 Diss 0 ## 5 M 3 Diss 1 ## 6 M 10 Diss 0 ## 7 M 20 Diss 0 ## 8 M 35 Diss 0 ## 9 F 3 Little 3 ## 10 F 10 Little 3 ## 11 F 20 Little 1 ## 12 F 35 Little 2 ## 13 M 3 Little 1 ## 14 M 10 Little 3 ## 15 M 20 Little 0 ## 16 M 35 Little 1 ## 17 F 3 Mod 11 ## 18 F 10 Mod 17 ## 19 F 20 Mod 8 ## 20 F 35 Mod 4 ## 21 M 3 Mod 2 ## 22 M 10 Mod 5 ## 23 M 20 Mod 7 ## 24 M 35 Mod 9 ## 25 F 3 Very 2 ## 26 F 10 Very 3 ## 27 F 20 Very 5 ## 28 F 35 Very 2 ## 29 M 3 Very 1 ## 30 M 10 Very 1 ## 31 M 20 Very 3 ## 32 M 35 Very 6 Guangyu Zhu Chapter 6: Multicategory Logit Models 9 / 60 Data Exploration Contingency Table tab1 = xtabs(Freq ~ Income + JobSat, jobsatisfaction) tab1 ## JobSat ## Income Diss Little Mod Very ## 3 2 4 13 3 ## 10 2 6 22 4 ## 20 0 1 15 8 ## 35 0 3 13 8 Get the conditional proportion table tab2 = prop.table(tab1, 1) * 100 tab2 ## JobSat ## Income Diss Little Mod Very ## 3 9.09 18.18 59.09 13.64 ## 10 5.88 17.65 64.71 11.76 ## 20 0.00 4.17 62.50 33.33 ## 35 0.00 12.50 54.17 33.33 Diss Little Mod Very 3 9.09 18.18 59.1 13.6 10 5.88 17.65 64.7 11.8 20 0.00 4.17 62.5 33.3 35 0.00 12.50 54.2 33.3 Guangyu Zhu Chapter 6: Multicategory Logit Models 10 / 60 Model Fitting Cast to wide form library(reshape2) jobsatw <- dcast(jobsatisfaction,="" income="" ~="" jobsat,="" sum,="" value.var="Freq" )="" jobsatw="" ##="" income="" diss="" little="" mod="" very="" ##="" 1="" 3="" 2="" 4="" 13="" 3="" ##="" 2="" 10="" 2="" 6="" 22="" 4="" ##="" 3="" 20="" 0="" 1="" 15="" 8="" ##="" 4="" 35="" 0="" 3="" 13="" 8="" fit="" multinomal="" logistic="" model="" library(vgam)="" jobsat.fit1="">-><- vglm(cbind(diss,="" little,="" mod,="" very)="" ~="" income,="" family="multinomial," data="jobsatw)" estimated="" coefficient="" coef(summary(jobsat.fit1))="" ##="" estimate="" std.="" error="" z="" value="" pr(="">|z|) ## (Intercept):1 0.4298 0.9448 0.455 0.649176 ## (Intercept):2 0.4563 0.6209 0.735 0.462423 ## (Intercept):3 1.7039 0.4811 3.542 0.000397 ## Income:1 -0.1854 0.1025 -1.808 0.070568 ## Income:2 -0.0544 0.0311 -1.748 0.080380 ## Income:3 -0.0374 0.0209 -1.790 0.073401 Guangyu Zhu Chapter 6:->5k>