1. 1.Read Shen et al (2016), Time varying associations of suicide …,
Lancet Psychiatry
as the background paper. The outcome in that paper is suicide completion. In this exercise, we will focus on a different outcome—substance use disorder—but follow the analytical framework in the research article.
2. 2.The goal of this exercise is to replicate a select set of rows from Table 1 in the paper, but using a different destructive behavior outcome and estimate a simplified survival model (the one in the article uses more data with over 110 million observations and the model is a bit more complex). To make this replication exercise manageable, the data you will use is (1) limited to the enlisted population; (2) contains only a random 5% of the original sample; and (3) has fewer variables. Use the dataset “subuse_data.” Variables are described in the Excel file. Use R Studio.
3. 3.Tasks: There are 2 sheets in the Excel file with self-explanatory sheet name. The “Table shell” sheet is identical to Table 1 of the
Lancet Psych
paper but with fewer rows. Your assignment is to fill out this table shell using the data provided and outcome given, and the simplified model below. Note:
a. Columns B-D are descriptive statistics, you don’t need to estimate regression models to produce these numbers.
b. Columns F-H are your results from estimating Cox proportional hazard model
c. Estimate Cox proportional model (stcox) where the outcome is
current_subuse
, and the independent variables are the ones listed in Table 1, as well as the following[1]: race, gender, age group, marital status, dependent quantity, rank, AFQT score categories, MOS categories.
4. 4.Additional questions to answer:
a. Using the research article as the guide, how would you describe/interpret in words the hazard ratio (HR) results of the 3 deployment variables in your analysis?
b. Suppose the HR estimate for the variable
current_demote
is 7.5. Upon seeing this result, a commander makes the following statement “Wow, looks like people who are demoted resorted to drug use to cope with this stressful event.” Is this statement a correct interpretation of the result? Briefly justify your answer.
c. What if the authors estimate an individual fixed-effects model (which would make this a LPM model) instead of Cox proportional hazard? What is lost or gained by doing the fixed effects LPM instead of Cox proportional hazard model?
5. 5.Submit (1) do-file; (2) log-file; (3) Excel file with filled-out tables. Include your answers to “Additional questions” as a block comment at the top of the do-file.
[1] These additional variables were also included in the multivariate model in the article’s Table 1—see table footnote on pg 4.