The University of New South WalesDepartment of StatisticsSchool of MathematicsMATH5945 - Categorical...

Question

The University of New South WalesDepartment of StatisticsSchool of MathematicsMATH5945 - Categorical Data AnalysisAssignment 2Due 21st April 2011Preparation: Please review the SAS Help and Documentation for PROC GEN-MOD. In particular, under SAS/STAT GENMOD Procedure, Getting Started, PoissonRegression you can learn about Type I and Type III tests needed for questions below.Further explanation is available under the Details section of the help entry.1. a) Assume that we have independent (but not necessarily identically distributed)observations Yi  N(i; 2); i = 1; 2; : : : ; n and i = x0i ; i = 1; 2; : : : ; n is our model forthe unknown means with deterministic vectors xi 2 Rp and a parameter vector 2 Rp:If X is the design matrix, as in Lecture 4, then,as known and easily seen, the MLE ofis ^ = (X0X)??1X0Y: Hence ^i = xi ^ are the MLE for the unknown means.i) Show that for this simple model, the G2 statistic is just G2 =Pni=1(yi ?? ^i)2=2:ii) What is the distribution of G2? Give reasons for your answer.b) For any 2  2 table, show that for the Pearson chi-square QP the following holds:QP =n(x11x22 ?? x12x21)2x1+x+1x2+x+2where n is the sum of all frequencies. (You may use: x1+x+1+x1+x+2+x2+x+1+x2+x+2 =n2:)2. Some set of counts (variable C) is considered to be dependent on a continuousvariable X. The counts can be assumed to have a Poisson distribution and one suspectsthat the "true" model is a Poison regression: the logarithms of the count means being alinear function of some powers of X (Poisson Polynomial Regression). For the maximaldegree of the polynomial, values such as 3 or 4 are entertained. The data is as follows:observation 1 2 3 4 5 6 7 8 9 10 11X -0.5 -0.4 -0.3 -0.2 -0.1 0 .1 .2 .3 .4 .5C 0 10 50 80 110 116 82 78 70 207 900Since the model is a polynomial regression, the Type 1 analysis of the model tis more appropriate to examine the appropriate degree of the polynomial. Please explainwhy Type 1 analysis is more appropriate.Basic goal in model choice is to nd a model that is as simple as possible but at thesame time still delivers an acceptable t. Try to t a 4th degree and a 3rd degree PoissonPolynomial Regression to the data set given using GENMOD. Is any of these modelsacceptable? For the two models, explain how the degrees of freedom for the correspondingmodel t statistics are obtained. Does any of the two models deliver an acceptable t?Which one of the models would you prefer and why? Try a model with a second degreepolynomial. What can you say about the resulting t?13. In a customer satisfaction survey for an insurance company customers of 3 types(A: Pay at Branch, B: Pay by Direct Debit, C: Payroll Deductions) were asked abouttheir direct contact with the organisation through branch visits. The results are:Visit within A B C Total1 ?? 6 One to six months ago 55 51 35 1417 ?? 12 Seven to twelve months ago 5 11 7 23> 12 More than twelve months ago 5 8 10 23Never 3 2 3 8Total 126 116 98 340a) Is there any evidence to suggest that customers using di erent methods of paymenthave di erent branch visit tendencies?b) Compare how close the Q and the G2 statistic turn out to be here. (use the SASprocedure FREQ). Also, for G2 only, do the calculations manually and compare themwith the SAS value.4. Investigators applied treatments A, B, and C to patients who had either a com-plicated or uncomplicated diagnosis of urinary tract infection. They were interested inwhether the pattern of treatment di erences were the same across diagnoses. This wouldmean that a simple additive e ect of treatment and of diagnosis only (without an interac-tion of Treatment  Diagnosis included) would t well. The table below shows the data:Diagnosis Treatment Cured Not CuredComplicated A 78 28Complicated B 101 11Complicated C 68 46Uncomplicated A 40 5Uncomplicated B 54 5Uncomplicated C 34 6a) The response with 2 categories (Cured of Not Cured) is to be modelled usinglogistic regressionwith Diagnostics and Treatment as input factors. Use GENMOD. Ifnecessary, use data transformation within SAS. Make sure you modellogit(Probability for cured jxi).b) Comment on the parameter estimates and on the Goodness of Fit table. In par-ticular, report a 95 % Likelihood Ratio-based con dence interval for the intercept.c) Request a TYPE 3 analysis. Looking at the LR statistics for the Type 3 analysis,do you think that it is possible to simplify further the model obtained in a)? Justify youranswer. If it seems reasonable, o er an alternative model.d) Assume you decide to use the model from a). Answer the questions:i) state the exact analytic form of the tted model.ii) What are the odds for a person with a complicated diagnosis who is treated bymethod A, to be cured. What are the odds for a person with uncomplicated diagnosiswho is treated by method A, to be cured. What is the odds-ratio for those two persons?iii) Compute and interpret the odds-ratio for the e ect of the Diagnosis on the treat-ment outcome when controlling for treatment.Attach the SAS commands and the SAS output.2

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3

The University of New South Wales Department of Statistics School of Mathematics MATH5945 - Categorical Data Analysis Assignment 2 Due 21st April 2011 Preparation: Please review the SAS Help and...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment