Homework 2 Homework 2 Biostatistics I Due 11:59pm on 09/27/2021 Question 1 Sickle cell anemia Sickle cell anemia is a genetic disease due to defects in the HBB gene. Each person has two copies of the...

Need the R studio codesno need for reference



Homework 2 Homework 2 Biostatistics I Due 11:59pm on 09/27/2021 Question 1 Sickle cell anemia Sickle cell anemia is a genetic disease due to defects in the HBB gene. Each person has two copies of the HBB gene, one copy from the father and one copy from the mother. Each copy of the HBB gene can be dominant, or recessive. A person can have (a) two copies of the dominant version (does not have the disease; cannot transmit the disease), (b) one copy of the dominant version and one copy of the recessive version (person does not have the disease; person is a carrier and can transmit the disease to progeny), or (c) two copies of the recessive version (the person has the disease; can transmit the disease to progeny). For any given child, each parent passes one of his/her two copies of the HBB gene to the child and each copy has a 0.5 probability of being passed on. Calculate the probability of a child having the disease if: Question 1a both parents are carriers (neither have the disease) Question 1b one parent is a carrier and the other parent does not have the disease and is not a carrier Question 1c one parent is a carrier and the other parent has the disease Calculate the probability that the child is a carrier if: Question 1d both parents are carriers (neither have the disease) Question 1e one parent is a carrier and the other parent does not have the disease and is not a carrier Question 1f one parent is a carrier and the other parent has the disease 1 Question 2 SARS-Cov-2 antibody test Serology tests detect the presence of antibodies in the blood. These tests detect the body’s immune response to an infection caused by a virus rather than detecting the virus itself. Hence, they can determine if an individual had been infected but are not as useful to detecting an active infection (when the virus is present). A COVID-19 antibody test can be used to identify individuals who have develoepd an immue response to SARS-CoV-2, the virus that induces COVID-19 infection. The performance of a test is described by its sensitivity and specificity. Sensitivity is the ability of a test to identify a person with antibodies to SARS-CoV-2; in other words, it is the probability the test will be positive for antibodies when the person really has antibodies (the probability of a true positive). Specificity is the ability of a test to identify individuals without antibodies to SARS-CoV-2; in other words, it is the probability the test will be negative for antibodies when a person really does not have antibodies (the probability of a true negative). There is a newly developed SARS-CoV-2 antibody test. It was tested in 500 individuals who were confirmed COVID-19 cases. These individuals tested positive for the COVID-19 virus using a nucleic acid amplification test and have subsequently recovered from the infection. These individuals should have SARS-CoV-2 antibodies. It was also tested in 1000 stored samples that were collected and frozen (for other reasons) before SAR-CoV-2 is known to have circulated. Hence, these individuals could not have been infected prior to, or at the time, when the samples were collected. These individuals should not have SARS-CoV-2 antibodies. The results are below: Figure 1: Test results as a function of antibody presence Question 2a What is the sensitivity of the test? Question 2b What is the specificity of the test? Suppose a test has a sensitivity of 0.95 and a specificity of 0.95. The positive predictive value of a test (PPV) is the probability that an individual has the antibodies if the test result was positive for antibodies. The negative predictive value (NPV) is the probability that an individual does not have the antibody if the test result was negative for antibodies. These probabilities depend on the proportion of people in the population who have antibodies (the prevalence). Suppose that prevalence for SARS-CoV-2 antibodies in the US is 0.05. Question 2c Fill in the missing values in the table below. 2 Question 2d What is the PPV? Question 2e What is the NPV? Question 2f Explain what happens to the PPV as the prevalence of individuals with SARS-CoV-2 antibodies increases. 3 Question 3 Proportion of obese individuals The proportion of adults in the US who are obese (BMI > 30 kg/m2) is 0.362. Question 3a If you take a random sample of 750 US adults, what is the expected number who are obese? Question 3b What is the probability that the number of obese people in a random sample of size 750 is within one standard deviation of the expected value? Question 3c What is the probability that the proportion of obese people in a sample of size 750 is between 0.35 and 0.38? Question 3d What is the smallest number such that we expect a 0.05 probability of obtaining this number of obese people or a number that is less than this number in a random sample of 750 US adults? Researchers in Saudi Arabia claim that the proportion of adults in Saudi Arabia who are obese is substantially less than in the US. They take a random sample of 170 adults and find that 60 are obese. Question 3e Is there evidence on the basis of this sample that the proportion of obese adults in Saudi Arabia is less than in the US? Note that even if the rates were exactly the same, due to the randomness in sampling, we would not expect the sample proportion to be exactly equal to 0.362. How different do these estimates have to be to provide evidence that the proportions differ? When you answer this, you need to consider all scenarios that would support the claim that the obesity rate for adults in Saudi Arabia is less than in the US based on what was observed. In other words, how likely would it have been to observe the Saudi Arabia data or data that would have been more supportive of the claim IF the Saudi Arabia rate were equal to the US rate? 4 Question 4 Sampling distribution for the sample proportion Suppose we are sampling from a population where it is known that 18.5% have diabetes (e.g. the population proportion π = 0.185). We plan to take a sample of size 300. Question 4a Describe the sampling distribution for the sample proportion computed from a sample of size 300 taken from a population with a population proportion of 0.185. Describe the distribution and give its mean value and its standard deviation. Question 4b Generate a sampling distribution via simulation for the sample proportion of individuals with diabetes obtained from a sample of 300, p300, from a population with a population proportion of 0.185. Use 5000 simulated values. Plot this distribution. What is its mean and standard deviation? The distribution appears somewhat normally distributed. Let’s explore how the sampling distribution of the sample proportion compares to that of a normal distribution using simulation. Obviously the best normal approximation would be the normal distribution with the sample mean and standard deviation as the sampling distribution for p, the sample proportion (the values you determined in 4a). Question 4c Simulate the normal distribution that would best describe the sampling distribution of the sample proportion for a sample of size 300 with π = 0.185 using 5000 simulated observations. Plot this distribution. Question 4d Make a qqplot comparing the simulated sampling distributions of the sample proportion based on a sample size of 300 patients (the one in 4b) to that of a normal distribution (the one in 4c), including a line that indicates a perfect relationship. Question 4e How do the two distributions compare? (Answer based on your qqplot in 4d.) Question 4f Compute the probability that the sample proportion of individuals with diabetes in a sample of 300 is between 0.17 and 0.20, inclusive. Determine this using the KNOWN (not simulated) sampling distribution for the sample proportion. Question 4g Approximate the probability that the sample proportion of individuals with diabetes in a sample of 300 is between 0.17 and 0.20, inclusive, using the best normal approximation to the sampling distribution. Determine this using the normal distribution (not simulated values). 5 Question 4h How do your answers to 4f and 4g compare? Question 4i Compute the probability that the sample proportion of individuals with diabetes in a sample of size 30 is between 0.17 and 0.20, inclusive. Determine this using the KNOWN (not simulated) sampling distribution for the sample proportion. Question 4j Approximate the probability that the sample proportion of individuals with diabetes in a sample of size 30 is between 0.17 and 0.20, inclusive, using the best normal approximation to the sampling distribution. Determine this using the normal distribution (not simulated values). Question 4k How do your answers to 4i and 4j compare? Question 4l Which do you think provides a better normal approximation of the sampling distribution for p? One for sample size of 30 or one for a sample size of 300? Question 4m Simulate the sampling distribution for the sample proportion when the the sample size is 30, p30, (using 5000 values). Create a plot that has the two densities of the simulated distributions of p30 and p300 (from 4b) on the same plot. Question 4n How do the two distributions in 4m compare? Describe similarities and differences. 6
Sep 24, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here