To do this assignment, you must first create your own dataset, drawn from the file Fall 2020 Assignment 2.xls (available on Canvas at the end of the "Module 2"). The observations that will make up your dataset depend on the last digit of your student ID (mine ends in 7), as shown in this table:
Student ID's Last Digit Patients
1 1-200
2 201-400
3 401-600
4 601-800
5 801-1000
6 1001-1200
7 1201-1400
8 1401-1600
9 1601-1800
0 1801-2000
You will use the first 160 of your observations for Parts 1, 2 and 3 of this assignment. You will use the last 40 for Part 4.
These data result from a study of patient satisfaction at a network of dermatology practices. The variables are: “Patient” (once you've created your dataset you can ignore this one); “Sex”; “Satisfaction” (the patient's self-reported overall satisfaction (1 to 100); “Effectiveness” (the patient's self-reported view of the effectiveness of the procedure (1 to 80); and “Pain” (the patient's self-reported post-procedure pain (1 to 80).
Part 1 (25 points). What is a 90% confidence interval for the correlation between Satisfaction and Effectiveness?
Part 2 (25 points). Conduct a simple regression analysis with Satisfaction as your dependent variable and Pain as your independent variable. Discuss what you see/observe.
Part 3 (25 points). Conduct a multiple regression analysis with both Effectiveness and Pain as your predictors. Discuss what you see/observe. Having done this, evaluate (and discuss) whether the incorporation of Sex into your model would be useful.
Part 4 (25 points). Using your “hold-out” sample, cross-validate the model you fit in Part 3. Discuss what you see/observe.