pasted my hmw
OSU College of Public Health PUBHBIO 7230 Spring 2023 Assignment 2 Background Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the general public is invited into their laboratories. This dataset is from October 21st, 1995, when visitors could test their reaction times and hand-eye coordination in the Human Sys- tems Integration Laboratory. The variable of interest, “anticipatory timing”, was measured by a Bassin timer. This device measures a person’s ability to estimate the speed of a moving light and its arrival at a designated point. The timer consists of a 10 foot row of lights that are switched on sequentially from one end to the other so that “travels” at 5 miles per hour down the timer. Each visitor was instructed to anticipate the arrival of the light at one end of the timer and at that time hit a button. In the original version of the data, negative time values indicated that the button was pressed before the light actually arrived; in the version provided to you, all times are positive values, indicating only the magnitude of mistiming, not the direction of the error. Each of the 113 visitors completed five trials. Age and gender were also recorded. Visitors tended to come in family groups, but that information was not recorded. These data are in the file timetrial.csv on Carmen. The data are provided in wide format, with variables id, age, male, trial1, trial2, trial3, trial4, and trial5. 1 1. Notation Consider the vector notation, where outcomes are denoted by Y and co- variates by X. Suppose we are interested in a regression model in which response time is the outcome and age, gender, and trial number are covariates. (An intercept is included as well.) (a) Assuming we are content to model the trial number as a linear covariate (i.e., assuming that the effect on response time is a linear function of the trial number), write the regression formula of the form: E(Y|X) = ... (b) Using the same model as in (a), consider the matrix notation of the linear model: Y = Xβ + ϵ. Using this notation, write the vector β. (c) Let Yi denote the response vector for each individual. What is the length of Yi? (d) What is the length of Y? (e) Let Xi denote the matrix of covariates for each individual. What are the dimen- sions of Xi? (f) Use the data available to write the matrix Xi when i = 22; that is write the design matrix for ID #22. (g) What are the dimensions of the matrix Σi = Var(Yi)? 2. Exploratory data analysis - trend across trials Using plots and summary statis- tics, describe trends in the timing across the five trials. For each plot/table, comment in one or two sentences on the type of information the plot/table tells you. In particular, you should address the following at a minimum: (a) Are visitors, in general, improving across the five trials? What pattern, if any, does this improvement follow? (b) Do any trends in the outcomes across trials differ by gender? (c) Do any trends in the outcome across trials differ by age? (d) Are there any outliers (either individual observations or trajectories) that you would consider suspicious from a data entry perspective or that you would be concerned about in your future analyses? 2 3. Basic Models (a) Compute the standard deviation for each trial and the correlation matrix for the repeated measures (using the data in wide format) and comment if you see any obvious pattern (you may also refer to results from question 2). (b) Using GLS, fit the model you presented in question 1 (accounting for trial number as a numerical variable), assuming uncorrelated errors and using both the com- plete data and the data removing the subject #35. Discuss the impact of this observation in the estimates and standard errors and, based on the analysis from question 1, decide if you will keep this subject in the remaining analysis. (c) Would it be better to consider age as a binary variable indicating if the subject is an adult or child? To answer that, create a new variable that is 0 if age < 20 and 1 otherwise, fit both models and compare their information criteria. tip: remember to use ml as the estimation method for this comparison. (d) which other covariance structures do you think would be appropriate for this dataset? fit at least two other possibilities (using reml) and select the most appropriate structure. (e) now, suppose we are interested in testing whether there is a “learning” effect across trials. using the gls model you selected in the previous item, test the hypothesis of a learning effect across trials. report the results in the form of a sentence or two suitable for publication in a scientific journal (to include a point estimate, 95% ci, and p-value, referring back to the scientific question). you should also specify the analysis done and provide a brief justification for your choice. (f) describe another question of scientific interest that could be answered with these data. then, conduct the analysis and report your results in the form of a sentence or two suitable for publication in a scientific journal (to include a point estimate, 95% ci, and p-value, referring back to the scientific question). you should also specify the analysis done and provide a brief justification for your choice. 3 20="" and="" 1="" otherwise,="" fit="" both="" models="" and="" compare="" their="" information="" criteria.="" tip:="" remember="" to="" use="" ml="" as="" the="" estimation="" method="" for="" this="" comparison.="" (d)="" which="" other="" covariance="" structures="" do="" you="" think="" would="" be="" appropriate="" for="" this="" dataset?="" fit="" at="" least="" two="" other="" possibilities="" (using="" reml)="" and="" select="" the="" most="" appropriate="" structure.="" (e)="" now,="" suppose="" we="" are="" interested="" in="" testing="" whether="" there="" is="" a="" “learning”="" effect="" across="" trials.="" using="" the="" gls="" model="" you="" selected="" in="" the="" previous="" item,="" test="" the="" hypothesis="" of="" a="" learning="" effect="" across="" trials.="" report="" the="" results="" in="" the="" form="" of="" a="" sentence="" or="" two="" suitable="" for="" publication="" in="" a="" scientific="" journal="" (to="" include="" a="" point="" estimate,="" 95%="" ci,="" and="" p-value,="" referring="" back="" to="" the="" scientific="" question).="" you="" should="" also="" specify="" the="" analysis="" done="" and="" provide="" a="" brief="" justification="" for="" your="" choice.="" (f)="" describe="" another="" question="" of="" scientific="" interest="" that="" could="" be="" answered="" with="" these="" data.="" then,="" conduct="" the="" analysis="" and="" report="" your="" results="" in="" the="" form="" of="" a="" sentence="" or="" two="" suitable="" for="" publication="" in="" a="" scientific="" journal="" (to="" include="" a="" point="" estimate,="" 95%="" ci,="" and="" p-value,="" referring="" back="" to="" the="" scientific="" question).="" you="" should="" also="" specify="" the="" analysis="" done="" and="" provide="" a="" brief="" justification="" for="" your="" choice.=""> 20 and 1 otherwise, fit both models and compare their information criteria. tip: remember to use ml as the estimation method for this comparison. (d) which other covariance structures do you think would be appropriate for this dataset? fit at least two other possibilities (using reml) and select the most appropriate structure. (e) now, suppose we are interested in testing whether there is a “learning” effect across trials. using the gls model you selected in the previous item, test the hypothesis of a learning effect across trials. report the results in the form of a sentence or two suitable for publication in a scientific journal (to include a point estimate, 95% ci, and p-value, referring back to the scientific question). you should also specify the analysis done and provide a brief justification for your choice. (f) describe another question of scientific interest that could be answered with these data. then, conduct the analysis and report your results in the form of a sentence or two suitable for publication in a scientific journal (to include a point estimate, 95% ci, and p-value, referring back to the scientific question). you should also specify the analysis done and provide a brief justification for your choice. 3>