The University of New South Wales Department of Statistics School of Mathematics MATH5945 - Categorical Data Analysis Assignment 2 Due 21st April 2011 Preparation: Please review the SAS Help and...

The University of New South Wales
Department of Statistics
School of Mathematics
MATH5945 - Categorical Data Analysis
Assignment 2
Due 21st April 2011
Preparation: Please review the SAS Help and Documentation for PROC GEN-
MOD. In particular, under SAS/STAT GENMOD Procedure, Getting Started, Poisson
Regression you can learn about Type I and Type III tests needed for questions below.
Further explanation is available under the Details section of the help entry.
1. a) Assume that we have independent (but not necessarily identically distributed)
observations Yi  N(i; 2); i = 1; 2; : : : ; n and i = x0
i ; i = 1; 2; : : : ; n is our model for
the unknown means with deterministic vectors xi 2 Rp and a parameter vector 2 Rp:
If X is the design matrix, as in Lecture 4, then,as known and easily seen, the MLE of
is ^ = (X0X)??1X0Y: Hence ^i = xi ^ are the MLE for the unknown means.
i) Show that for this simple model, the G2 statistic is just G2 =
Pni
=1(yi ?? ^i)2=2:
ii) What is the distribution of G2? Give reasons for your answer.
b) For any 2  2 table, show that for the Pearson chi-square QP the following holds:
QP =
n(x11x22 ?? x12x21)2
x1+x+1x2+x+2
where n is the sum of all frequencies. (You may use: x1+x+1+x1+x+2+x2+x+1+x2+x+2 =
n2:)
2. Some set of counts (variable C) is considered to be dependent on a continuous
variable X. The counts can be assumed to have a Poisson distribution and one suspects
that the "true" model is a Poison regression: the logarithms of the count means being a
linear function of some powers of X (Poisson Polynomial Regression). For the maximal
degree of the polynomial, values such as 3 or 4 are entertained. The data is as follows:
observation 1 2 3 4 5 6 7 8 9 10 11
X -0.5 -0.4 -0.3 -0.2 -0.1 0 .1 .2 .3 .4 .5
C 0 10 50 80 110 116 82 78 70 207 900
Since the model is a polynomial regression, the Type 1 analysis of the model t
is more appropriate to examine the appropriate degree of the polynomial. Please explain
why Type 1 analysis is more appropriate.
Basic goal in model choice is to nd a model that is as simple as possible but at the
same time still delivers an acceptable t. Try to t a 4th degree and a 3rd degree Poisson
Polynomial Regression to the data set given using GENMOD. Is any of these models
acceptable? For the two models, explain how the degrees of freedom for the corresponding
model t statistics are obtained. Does any of the two models deliver an acceptable t?
Which one of the models would you prefer and why? Try a model with a second degree
polynomial. What can you say about the resulting t?
1
3. In a customer satisfaction survey for an insurance company customers of 3 types
(A: Pay at Branch, B: Pay by Direct Debit, C: Payroll Deductions) were asked about
their direct contact with the organisation through branch visits. The results are:
Visit within A B C Total


1 ?? 6 One to six months ago 55 51 35 141
7 ?? 12 Seven to twelve months ago 5 11 7 23
> 12 More than twelve months ago 5 8 10 23
Never 3 2 3 8
Total 126 116 98 340
a) Is there any evidence to suggest that customers using di erent methods of payment
have di erent branch visit tendencies?
b) Compare how close the Q and the G2 statistic turn out to be here. (use the SAS
procedure FREQ). Also, for G2 only, do the calculations manually and compare them
with the SAS value.
4. Investigators applied treatments A, B, and C to patients who had either a com-
plicated or uncomplicated diagnosis of urinary tract infection. They were interested in
whether the pattern of treatment di erences were the same across diagnoses. This would
mean that a simple additive e ect of treatment and of diagnosis only (without an interac-
tion of Treatment  Diagnosis included) would t well. The table below shows the data:
Diagnosis Treatment Cured Not Cured
Complicated A 78 28
Complicated B 101 11
Complicated C 68 46
Uncomplicated A 40 5
Uncomplicated B 54 5
Uncomplicated C 34 6
a) The response with 2 categories (Cured of Not Cured) is to be modelled using
logistic regressionwith Diagnostics and Treatment as input factors. Use GENMOD. If
necessary, use data transformation within SAS. Make sure you model
logit(Probability for cured jxi).
b) Comment on the parameter estimates and on the Goodness of Fit table. In par-
ticular, report a 95 % Likelihood Ratio-based con dence interval for the intercept.
c) Request a TYPE 3 analysis. Looking at the LR statistics for the Type 3 analysis,
do you think that it is possible to simplify further the model obtained in a)? Justify your
answer. If it seems reasonable, o er an alternative model.
d) Assume you decide to use the model from a). Answer the questions:
i) state the exact analytic form of the tted model.
ii) What are the odds for a person with a complicated diagnosis who is treated by
method A, to be cured. What are the odds for a person with uncomplicated diagnosis
who is treated by method A, to be cured. What is the odds-ratio for those two persons?
iii) Compute and interpret the odds-ratio for the e ect of the Diagnosis on the treat-
ment outcome when controlling for treatment.
Attach the SAS commands and the SAS output.
2
May 14, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here