I need help with my R assignment
BME 530 Assignment 2 Complete the assignment using R. Q1. (2pts) Search a DNA database with a sample. Each time you attempt to match this sample to an entry in the database, there is a probability of an accidental chance match of 1e-4. Chance matches are independent. There are 20,000 people in the database. What is the probability I get at least one match, purely by chance? Q2. (2 pts) Disease A occurs with a probability 0.1, and disease B occurs with a probability 0.2. It is not possible to have both diseases. Suppose that there is a single test witch reports positive with a probability 0.8 for a patient with disease A, with a probability 0.5 for a patient with disease B, and with a probability 0.01 for a patient with no disease. What is the probability that you have no disease even if the test comes back positive? Please use the following notations. Let ?: the event that you have disease A; ℬ: the event that you have disease B; ?: the event that you have no disease; and ?: the event that the test result is positive. Q3. (2pts) Stacked Auditory Brainstem Response. The failure of standard auditory brainstem response (ABR) measures to detect small (<1 cm)="" acoustic="" tumors="" has="" led="" to="" the="" use="" of="" enhanced="" magnetic="" resonance="" imaging="" (mri)="" as="" the="" standard="" to="" screen="" for="" small="" tumors.="" a="" study="" investigated="" the="" suitability="" of="" the="" stacked="" abr="" as="" a="" sensitive="" screening="" alternative="" to="" mri="" for="" small="" acoustic="" tumors="" (sats).="" the="" objective="" of="" the="" study="" was="" to="" determine="" the="" sensitivity="" and="" specificity="" of="" the="" stacked="" abr="" technique="" for="" detecting="" sats.="" a="" total="" of="" 54="" patients="" were="" studied="" who="" had="" mri-identified="" acoustic="" tumors="" that="" were="" either="">1><1cm in size or undetected by standard abr methods, irrespective of size. there were 78 nontumor normal-hearing subjects who tested as controls. the stacked abr demonstrated 95% sensitivity and 88% specificity. please recover the testing result table. q4. (4pts) duchenne muscular dystrophy, sometimes shortened to dmd or just duchenne, is a rare genetic disease. it primarily affects males, but, in rare cases, can also affect females. duchenne causes the muscles in the body to become weak and damaged over time, and is eventually fatal. the genetic change that causes duchenne — a mutation in the dmd gene — happens before birth and can be inherited, or new mutations in the gene can occur spontaneously . researchers used measures of pyruvate kinase and lactate dehydrogenase to assess an individual’s carrier status. the following table summarizes the test results. woman carrier woman not carrier total test positive 56 6 62 test negative 11 121 132 total 67 127 194 (a) compute the sensitivity and specificity of the test. the sample used in the test study is not representative of the general population for which the prevalence of carriers is 0.03%, or 3 in 10,000. with this information, find the ppv of the test, that is, the probability that a woman is a dmd carrier if she tested positive. (c) what is the ppv if the table was constructed from a random sample of 194 subjects from a general population? (d) approximate the probability that among 15,000 women randomly selected from a general population, at least 2 are dmd carriers. q5. (10pts) data have been collected for evaluating two biomarkers for pancreatic cancer. see the attached data file. more specifically, m = 51 'control' patients with pancreatitis and n =90 'cases' with pancreatic cancer were studied at the mayo clinic with a cancer antigen (ca125) and with a carbohydrate antigen (ca19-9). they are measured for each patient and collected in the first and the second columns respectively in the file. the value “0” in the third column “0” represent control status for each patient. 1) compute all distinct pairs of sensitivity and specificity. how many are there? 2) plot the receiver operating characteristic (roc) curve using the computed values from (1) for each biomarker. 3) calculated the area under the roc curve (auc) for both biomarkers. based on your calculation, which biomarker is better? 4) for each biomarker, choose the best threshold (justify your choice) and compute the sensitivity, specificity, predicted value, and negative predicted value for each biomarker based on that threshold. note: confirm your result with the package here to draw roc curve. but no credit will be given if you use the package to complete the assignment. https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ submission. submit a zip file (hw2_name.zip) on blackboard that includes: 1) both r notebook (.rmd) and rendered results (.html). your notebook should include some annotations for the scripts so that a grader knows what the script is for. 2) report document (.pdf) https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ ca 19-9,ca 125,cancer status 28,13.3,0 15.5,11.1,0 8.2,16.7,0 3.4,12.6,0 17.3,7.4,0 15.2,5.5,0 32.9,32.1,0 11.1,27.2,0 87.5,6.6,0 16.2,9.8,0 107.9,10.5,0 5.7,7.8,0 25.6,9.1,0 31.2,12.3,0 21.6,12,0 55.6,42.1,0 8.8,5.9,0 6.5,9.2,0 22.1,7.3,0 14.4,6.8,0 44.2,10.7,0 3.7,15.7,0 7.8,8,0 8.9,6.8,0 18,47.35,0 6.5,17.9,0 4.9,96.2,0 10.4,108.9,0 5,16.6,0 5.3,9.5,0 6.5,179,0 6.9,12.1,0 8.2,35.6,0 21.8,15,0 6.6,12.6,0 7.6,5.9,0 15.4,10.1,0 59.2,8.5,0 5.1,11.4,0 10,54.65,0 5.3,9.7,0 32.6,11.2,0 4.6,35.7,0 6.9,22.5,0 4,21.2,0 3.65,5.6,0 7.8,9.4,0 32.5,12,0 11.5,9.8,0 4,17.2,0 10.2,10.6,0 2.4,79.1,1 719,31.4,1 2106.667,15,1 24000,77.8,1 1715,25.7,1 3.6,11.7,1 521.5,8.25,1 1600,14.95,1 454,8.7,1 109.7,14.1,1 23.7,123.9,1 464,12.1,1 9810,99.1,1 255,18.6,1 58.7,10.5,1 225,6.6,1 90.1,74,1 50,43.9,1 5.6,45.7,1 4070,13,1 592,7.3,1 28.6,8.6,1 6160,17.2,1 1090,15.4,1 10.4,14.3,1 27.3,93.1,1 162,66.3,1 3560,26.7,1 14.7,32.4,1 83.3,9.9,1 336,30.3,1 55.7,11.2,1 1520,202,1 3.9,35.7,1 5.8,9.2,1 8.45,103.6,1 361,21.4,1 369,8.1,1 8230,29.9,1 39.3,17.5,1 43.5,30.8,1 361,57.3,1 12.8,6.5,1 18,33.8,1 9590,53.6,1 555,17.2,1 60.2,94.2,1 21.8,33.5,1 900,3.7,1 6.6,11.7,1 239,19.9,1 3100,38.7,1 3275,27.3,1 682,20.1,1 85.4,86.1,1 10290,844,1 770,36.9,1 247.6,6.9,1 12320,27.7,1 113.1,9.9,1 1079,38.6,1 45.6,142.6,1 1630,12.5,1 79.4,11.6,1 508,21.2,1 3190,13.2,1 542,19.2,1 1021,1024,1 235,14.1,1 251,34.8,1 3160,35.3,1 479,35,1 222,15.5,1 15.7,12.1,1 2540,31.6,1 11630,184.8,1 1810,24.8,1 6.9,10.4,1 4.1,34.5,1 15.6,19.4,1 9820,22.2,1 1490,53.9,1 15.7,15.4,1 45.8,17.3,1 7.8,36.8,1 12.8,49.8,1 100.5333,26.5667,1 227,9.7,1 70.9,19.2,1 2500,14.2,1 in="" size="" or="" undetected="" by="" standard="" abr="" methods,="" irrespective="" of="" size.="" there="" were="" 78="" nontumor="" normal-hearing="" subjects="" who="" tested="" as="" controls.="" the="" stacked="" abr="" demonstrated="" 95%="" sensitivity="" and="" 88%="" specificity.="" please="" recover="" the="" testing="" result="" table.="" q4.="" (4pts)="" duchenne="" muscular="" dystrophy,="" sometimes="" shortened="" to="" dmd="" or="" just="" duchenne,="" is="" a="" rare="" genetic="" disease.="" it="" primarily="" affects="" males,="" but,="" in="" rare="" cases,="" can="" also="" affect="" females.="" duchenne="" causes="" the="" muscles="" in="" the="" body="" to="" become="" weak="" and="" damaged="" over="" time,="" and="" is="" eventually="" fatal.="" the="" genetic="" change="" that="" causes="" duchenne="" —="" a="" mutation="" in="" the="" dmd="" gene="" —="" happens="" before="" birth="" and="" can="" be="" inherited,="" or="" new="" mutations="" in="" the="" gene="" can="" occur="" spontaneously="" .="" researchers="" used="" measures="" of="" pyruvate="" kinase="" and="" lactate="" dehydrogenase="" to="" assess="" an="" individual’s="" carrier="" status.="" the="" following="" table="" summarizes="" the="" test="" results.="" woman="" carrier="" woman="" not="" carrier="" total="" test="" positive="" 56="" 6="" 62="" test="" negative="" 11="" 121="" 132="" total="" 67="" 127="" 194="" (a)="" compute="" the="" sensitivity="" and="" specificity="" of="" the="" test.="" the="" sample="" used="" in="" the="" test="" study="" is="" not="" representative="" of="" the="" general="" population="" for="" which="" the="" prevalence="" of="" carriers="" is="" 0.03%,="" or="" 3="" in="" 10,000.="" with="" this="" information,="" find="" the="" ppv="" of="" the="" test,="" that="" is,="" the="" probability="" that="" a="" woman="" is="" a="" dmd="" carrier="" if="" she="" tested="" positive.="" (c)="" what="" is="" the="" ppv="" if="" the="" table="" was="" constructed="" from="" a="" random="" sample="" of="" 194="" subjects="" from="" a="" general="" population?="" (d)="" approximate="" the="" probability="" that="" among="" 15,000="" women="" randomly="" selected="" from="" a="" general="" population,="" at="" least="" 2="" are="" dmd="" carriers.="" q5.="" (10pts)="" data="" have="" been="" collected="" for="" evaluating="" two="" biomarkers="" for="" pancreatic="" cancer.="" see="" the="" attached="" data="" file.="" more="" specifically,="" m="51" 'control'="" patients="" with="" pancreatitis="" and="" n="90" 'cases'="" with="" pancreatic="" cancer="" were="" studied="" at="" the="" mayo="" clinic="" with="" a="" cancer="" antigen="" (ca125)="" and="" with="" a="" carbohydrate="" antigen="" (ca19-9).="" they="" are="" measured="" for="" each="" patient="" and="" collected="" in="" the="" first="" and="" the="" second="" columns="" respectively="" in="" the="" file.="" the="" value="" “0”="" in="" the="" third="" column="" “0”="" represent="" control="" status="" for="" each="" patient.="" 1)="" compute="" all="" distinct="" pairs="" of="" sensitivity="" and="" specificity.="" how="" many="" are="" there?="" 2)="" plot="" the="" receiver="" operating="" characteristic="" (roc)="" curve="" using="" the="" computed="" values="" from="" (1)="" for="" each="" biomarker.="" 3)="" calculated="" the="" area="" under="" the="" roc="" curve="" (auc)="" for="" both="" biomarkers.="" based="" on="" your="" calculation,="" which="" biomarker="" is="" better?="" 4)="" for="" each="" biomarker,="" choose="" the="" best="" threshold="" (justify="" your="" choice)="" and="" compute="" the="" sensitivity,="" specificity,="" predicted="" value,="" and="" negative="" predicted="" value="" for="" each="" biomarker="" based="" on="" that="" threshold.="" note:="" confirm="" your="" result="" with="" the="" package="" here="" to="" draw="" roc="" curve.="" but="" no="" credit="" will="" be="" given="" if="" you="" use="" the="" package="" to="" complete="" the="" assignment.="" https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/="" submission.="" submit="" a="" zip="" file="" (hw2_name.zip)="" on="" blackboard="" that="" includes:="" 1)="" both="" r="" notebook="" (.rmd)="" and="" rendered="" results="" (.html).="" your="" notebook="" should="" include="" some="" annotations="" for="" the="" scripts="" so="" that="" a="" grader="" knows="" what="" the="" script="" is="" for.="" 2)="" report="" document="" (.pdf)="" https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/="" ca="" 19-9,ca="" 125,cancer="" status="" 28,13.3,0="" 15.5,11.1,0="" 8.2,16.7,0="" 3.4,12.6,0="" 17.3,7.4,0="" 15.2,5.5,0="" 32.9,32.1,0="" 11.1,27.2,0="" 87.5,6.6,0="" 16.2,9.8,0="" 107.9,10.5,0="" 5.7,7.8,0="" 25.6,9.1,0="" 31.2,12.3,0="" 21.6,12,0="" 55.6,42.1,0="" 8.8,5.9,0="" 6.5,9.2,0="" 22.1,7.3,0="" 14.4,6.8,0="" 44.2,10.7,0="" 3.7,15.7,0="" 7.8,8,0="" 8.9,6.8,0="" 18,47.35,0="" 6.5,17.9,0="" 4.9,96.2,0="" 10.4,108.9,0="" 5,16.6,0="" 5.3,9.5,0="" 6.5,179,0="" 6.9,12.1,0="" 8.2,35.6,0="" 21.8,15,0="" 6.6,12.6,0="" 7.6,5.9,0="" 15.4,10.1,0="" 59.2,8.5,0="" 5.1,11.4,0="" 10,54.65,0="" 5.3,9.7,0="" 32.6,11.2,0="" 4.6,35.7,0="" 6.9,22.5,0="" 4,21.2,0="" 3.65,5.6,0="" 7.8,9.4,0="" 32.5,12,0="" 11.5,9.8,0="" 4,17.2,0="" 10.2,10.6,0="" 2.4,79.1,1="" 719,31.4,1="" 2106.667,15,1="" 24000,77.8,1="" 1715,25.7,1="" 3.6,11.7,1="" 521.5,8.25,1="" 1600,14.95,1="" 454,8.7,1="" 109.7,14.1,1="" 23.7,123.9,1="" 464,12.1,1="" 9810,99.1,1="" 255,18.6,1="" 58.7,10.5,1="" 225,6.6,1="" 90.1,74,1="" 50,43.9,1="" 5.6,45.7,1="" 4070,13,1="" 592,7.3,1="" 28.6,8.6,1="" 6160,17.2,1="" 1090,15.4,1="" 10.4,14.3,1="" 27.3,93.1,1="" 162,66.3,1="" 3560,26.7,1="" 14.7,32.4,1="" 83.3,9.9,1="" 336,30.3,1="" 55.7,11.2,1="" 1520,202,1="" 3.9,35.7,1="" 5.8,9.2,1="" 8.45,103.6,1="" 361,21.4,1="" 369,8.1,1="" 8230,29.9,1="" 39.3,17.5,1="" 43.5,30.8,1="" 361,57.3,1="" 12.8,6.5,1="" 18,33.8,1="" 9590,53.6,1="" 555,17.2,1="" 60.2,94.2,1="" 21.8,33.5,1="" 900,3.7,1="" 6.6,11.7,1="" 239,19.9,1="" 3100,38.7,1="" 3275,27.3,1="" 682,20.1,1="" 85.4,86.1,1="" 10290,844,1="" 770,36.9,1="" 247.6,6.9,1="" 12320,27.7,1="" 113.1,9.9,1="" 1079,38.6,1="" 45.6,142.6,1="" 1630,12.5,1="" 79.4,11.6,1="" 508,21.2,1="" 3190,13.2,1="" 542,19.2,1="" 1021,1024,1="" 235,14.1,1="" 251,34.8,1="" 3160,35.3,1="" 479,35,1="" 222,15.5,1="" 15.7,12.1,1="" 2540,31.6,1="" 11630,184.8,1="" 1810,24.8,1="" 6.9,10.4,1="" 4.1,34.5,1="" 15.6,19.4,1="" 9820,22.2,1="" 1490,53.9,1="" 15.7,15.4,1="" 45.8,17.3,1="" 7.8,36.8,1="" 12.8,49.8,1="" 100.5333,26.5667,1="" 227,9.7,1="" 70.9,19.2,1="">1cm in size or undetected by standard abr methods, irrespective of size. there were 78 nontumor normal-hearing subjects who tested as controls. the stacked abr demonstrated 95% sensitivity and 88% specificity. please recover the testing result table. q4. (4pts) duchenne muscular dystrophy, sometimes shortened to dmd or just duchenne, is a rare genetic disease. it primarily affects males, but, in rare cases, can also affect females. duchenne causes the muscles in the body to become weak and damaged over time, and is eventually fatal. the genetic change that causes duchenne — a mutation in the dmd gene — happens before birth and can be inherited, or new mutations in the gene can occur spontaneously . researchers used measures of pyruvate kinase and lactate dehydrogenase to assess an individual’s carrier status. the following table summarizes the test results. woman carrier woman not carrier total test positive 56 6 62 test negative 11 121 132 total 67 127 194 (a) compute the sensitivity and specificity of the test. the sample used in the test study is not representative of the general population for which the prevalence of carriers is 0.03%, or 3 in 10,000. with this information, find the ppv of the test, that is, the probability that a woman is a dmd carrier if she tested positive. (c) what is the ppv if the table was constructed from a random sample of 194 subjects from a general population? (d) approximate the probability that among 15,000 women randomly selected from a general population, at least 2 are dmd carriers. q5. (10pts) data have been collected for evaluating two biomarkers for pancreatic cancer. see the attached data file. more specifically, m = 51 'control' patients with pancreatitis and n =90 'cases' with pancreatic cancer were studied at the mayo clinic with a cancer antigen (ca125) and with a carbohydrate antigen (ca19-9). they are measured for each patient and collected in the first and the second columns respectively in the file. the value “0” in the third column “0” represent control status for each patient. 1) compute all distinct pairs of sensitivity and specificity. how many are there? 2) plot the receiver operating characteristic (roc) curve using the computed values from (1) for each biomarker. 3) calculated the area under the roc curve (auc) for both biomarkers. based on your calculation, which biomarker is better? 4) for each biomarker, choose the best threshold (justify your choice) and compute the sensitivity, specificity, predicted value, and negative predicted value for each biomarker based on that threshold. note: confirm your result with the package here to draw roc curve. but no credit will be given if you use the package to complete the assignment. https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ submission. submit a zip file (hw2_name.zip) on blackboard that includes: 1) both r notebook (.rmd) and rendered results (.html). your notebook should include some annotations for the scripts so that a grader knows what the script is for. 2) report document (.pdf) https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ ca 19-9,ca 125,cancer status 28,13.3,0 15.5,11.1,0 8.2,16.7,0 3.4,12.6,0 17.3,7.4,0 15.2,5.5,0 32.9,32.1,0 11.1,27.2,0 87.5,6.6,0 16.2,9.8,0 107.9,10.5,0 5.7,7.8,0 25.6,9.1,0 31.2,12.3,0 21.6,12,0 55.6,42.1,0 8.8,5.9,0 6.5,9.2,0 22.1,7.3,0 14.4,6.8,0 44.2,10.7,0 3.7,15.7,0 7.8,8,0 8.9,6.8,0 18,47.35,0 6.5,17.9,0 4.9,96.2,0 10.4,108.9,0 5,16.6,0 5.3,9.5,0 6.5,179,0 6.9,12.1,0 8.2,35.6,0 21.8,15,0 6.6,12.6,0 7.6,5.9,0 15.4,10.1,0 59.2,8.5,0 5.1,11.4,0 10,54.65,0 5.3,9.7,0 32.6,11.2,0 4.6,35.7,0 6.9,22.5,0 4,21.2,0 3.65,5.6,0 7.8,9.4,0 32.5,12,0 11.5,9.8,0 4,17.2,0 10.2,10.6,0 2.4,79.1,1 719,31.4,1 2106.667,15,1 24000,77.8,1 1715,25.7,1 3.6,11.7,1 521.5,8.25,1 1600,14.95,1 454,8.7,1 109.7,14.1,1 23.7,123.9,1 464,12.1,1 9810,99.1,1 255,18.6,1 58.7,10.5,1 225,6.6,1 90.1,74,1 50,43.9,1 5.6,45.7,1 4070,13,1 592,7.3,1 28.6,8.6,1 6160,17.2,1 1090,15.4,1 10.4,14.3,1 27.3,93.1,1 162,66.3,1 3560,26.7,1 14.7,32.4,1 83.3,9.9,1 336,30.3,1 55.7,11.2,1 1520,202,1 3.9,35.7,1 5.8,9.2,1 8.45,103.6,1 361,21.4,1 369,8.1,1 8230,29.9,1 39.3,17.5,1 43.5,30.8,1 361,57.3,1 12.8,6.5,1 18,33.8,1 9590,53.6,1 555,17.2,1 60.2,94.2,1 21.8,33.5,1 900,3.7,1 6.6,11.7,1 239,19.9,1 3100,38.7,1 3275,27.3,1 682,20.1,1 85.4,86.1,1 10290,844,1 770,36.9,1 247.6,6.9,1 12320,27.7,1 113.1,9.9,1 1079,38.6,1 45.6,142.6,1 1630,12.5,1 79.4,11.6,1 508,21.2,1 3190,13.2,1 542,19.2,1 1021,1024,1 235,14.1,1 251,34.8,1 3160,35.3,1 479,35,1 222,15.5,1 15.7,12.1,1 2540,31.6,1 11630,184.8,1 1810,24.8,1 6.9,10.4,1 4.1,34.5,1 15.6,19.4,1 9820,22.2,1 1490,53.9,1 15.7,15.4,1 45.8,17.3,1 7.8,36.8,1 12.8,49.8,1 100.5333,26.5667,1 227,9.7,1 70.9,19.2,1 2500,14.2,1>