I've attached the file
5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 1/10 Lab 4: Continuous Distributions and the Central Limit Theorem Math 270 Task 1: Exponential Distribution Task 2: Central Limit Theorem in more Detail Task 3: Does the CLT work for sample statistics besides the sample mean? Task 4: Show your understanding of the CLT Task 1: Exponential Distribution 1. The lifetime of an iPhone (in months) can be modeled as exponentially distributed with an average of 25 months. [Hint: ] a. What is the probability that a new iPhone will need to be replaced in the first 6 months? b. If you purchased the phone for $850 and the company will provide a full rebate if it needs replacement in the first 6 months, a half rebate if it needs replacement in the next 6 months, and nothing if it needs replacement after 12 months, then what is the expected amount the company will have to rebate per phone? c. According to this model, 95% of all iPhones would need replacement within how many months? [Hint: quantile function] d. According to the Central Limit Theorem, if a Quality Analytics consultant samples batches of 100 iPhones, they should expect the averages to follow what theoretical distribution (include parameters)? Save these parameters in the workspace below, replacing the current “0” placeholders: The sample averages will follow a ___________ distribution with parameters ___________________________. Workspace: mu_theory = 0 #replace with true theoretical sd_theory = 0 #replace with true theoretical Now simulate this: use replicate() and the rexp() sampling function to find 10,000 sample means for the average longevity of different batches of 100 iPhones. Save these 10,000 sample means to the variable Xbars . Then replace the “0s” to compute the (empirical) mean and standard deviation of these sample means. Workspace: λ ≠ 20 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 2/10 Xbars = replicate(10000,mean(rexp(100,1/25))) #CHANGE mu_empirical = 0 #CHANGE sd_empirical = 0 #CHANGE After the changes above, the following code should plot the empirical distribution of sample means as a histogram vs. the theoretical distribution as a green curve. You should see the same shape emerge. If you don’t, go back and check your work. hist(Xbars,freq=FALSE, breaks=15:40) lines(dnorm(15:40,mu_theory,sd_theory),col="green") e. Using your answer(s) from (d), write up solid, probability-based reasoning the Quality Analytics consultant could use to defend why they think something unusual is happening if they sample a batch of 100 iPhones and find the sample’s average lifetime is less than 11 months. Task 2: Central Limit Theorem in more Detail In this task you will define a population of individuals that is decidedly NOT normally distributed. Then you will see if the CLT holds, and if so, for how what size sample. Below is one example of a “population” I’ve divised regarding scores on a 100 question multiple choice test given to 2000 students. 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 3/10 #200 students were absent from the exam. Absent = rep(0,200) #400 students just guessed Guessed = rbinom(400,100,1/4) #1400 students studied -- they have an average score of 80 with standard deviation of 20. The r est pmin ensures the score is never over 100, pmin ensures it's never below 0, and round makes t his a discrete score. Studied = round(pmax(pmin(rnorm(1400,80,20),100),0)) Population = c(Absent,Guessed,Studied) hist(Population,breaks=seq(-5.5,105.5,5)) 1. Explain why the “400 students just guessed” is created using the given binomial distribution. 2. Compute the mean and standard deviation of the full population: Workspace (replace with code to compute) mean_pop = 0 sd_pop = 0 2. Read the code below. Then summarize: what is the “for loop” creating? What, for example, will the value of sd_vec[6] represent? 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 4/10 What will mean_vec[6] represent? # Initialize: the sample sizes will range from 10,20,30,...,1000 sample_sizes = seq(10,1000,10) sd_vec=rep(0,100) mean_vec=rep(0,100) for(i in 1:100){ Xbars = replicate(900,mean(sample(Population,10*i))) mean_vec[i] = mean(Xbars) sd_vec[i] = sd(Xbars) } 3. Build the THEORETICAL equivalents to mean_vec and sd_vec – if the CLT holds, then mean_vec[i] SHOULD be close to what value, and sd_vec[i] should be close to what value? mean_vec_thry = rep(0,100) #REPLACE WITH THE THEORETICAL VALUES sd_vec_thry = sd_pop/sqrt(sample_sizes) #rep(0,100) #REPLACE WITH THE THEORETICAL VALUES 4. Now you’ll compare the theoretical and empirical perspectives on the CLT. In particular, write 2-5 sentences about the following graph of average of sample mean vs. the sample size (which will update after you’ve run your code above): what features of the Central Limit Theorem are displayed here? This block of code does not need to be edited. #First we'll plot the empirical mean_vec vs. the sample size: plot(sample_sizes,mean_vec) #Compared to the theoretical (in green): lines(sample_sizes,mean_vec_thry,col="green") 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 5/10 Now for the standard deviation of sample means vs. the sample size. Write 2-5 sentences about what aspects of the CLT are displayed here. This block of code does not need to be edited. #First we'll plot the empirical mean_vec vs. the sample size: plot(sample_sizes,sd_vec) #Compared to the theoretical (in green): lines(sample_sizes,sd_vec_thry,col="green") 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 6/10 Finally, what are the implications that the green line in the standard deviation plot is above the black dots? Is the theoretical sd from the CLT an over or an under-estimate of the actual standard deviation of sample means? 5. Using the appropriate distribution, use the CLT to find the theoretical probability of taking a sample of 100 students (from this population) and finding that their average score is less than 55. [Note: this is called the “p-value”] Then run a simulation to estimate the same probability empirically. Task 3: Does the CLT work for sample statistics besides the sample mean? For the following tasks you’ll adapt the code below, for sample means, to look at statistics besides the sample mean to see when the CLT might apply, and when it might not. The main goal of this section is to get firsthand experience that the CLT should not be arbitrarily applied to statistics other than the sampling mean. means = replicate(10000,mean(sample(Population,100))) hist(means,freq=FALSE,breaks=seq(-0.5,100.5,1.5)) lines(dnorm(0:100,mean(Population),sd(Population)/10),col="green") 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 7/10 Medians: Does CLT apply? Are sample medians distributed normally around the population median, with standard deviation equal to ? To find out, replace “mean” with “median” everywhere in the code below, and run it to examine the histogram of medians of different samples of 100 each. means = replicate(10000,mean(sample(Population,100))) hist(means,freq=FALSE,breaks=seq(-0.5,100.5,1.5)) lines(dnorm(0:100,mean(Population),sd(Population)/10),col="green") σ n√ 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 8/10 Does this look like the CLT applies for medians? Well Decently – but there are some issues like ___________ Poorly – because __________________ 90th Percentile: Does CLT apply? Are sample 90th percentiles distributed normally around the population 90th percentile, with standard deviation equal to ? Again, change “means” – this time you’ll want to use the quantile(X,0.9) function. means = replicate(10000,mean(sample(Population,100))) hist(means,freq=FALSE,breaks=seq(-0.5,100.5,1.5)) lines(dnorm(0:100,mean(Population),sd(Population)/10),col="green") σ n√ 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 9/10 Does this look like the CLT applies for 90th percentiles? Well Decently – but there are some issues like ___________ Poorly – because __________________ Sample Standard Deviations – does CLT apply? Are sample standard deviations distributed normally around the population standard deviation, with standard deviation equal to ? To find out, replace “mean” with “sd” everywhere in the code below, and run it to examine the histogram of medians of different samples of 100 each. means = replicate(10000,mean(sample(Population,100))) hist(means,freq=FALSE,breaks=seq(-0.5,100.5,1.5)) lines(dnorm(0:100,mean(Population),sd(Population)/10),col="green") σ n√ 5/15/2019 Lab 4: Continuous Distributions and the Central Limit Theorem file:///C:/Users/Nesi/Downloads/Lab4_CLT.html 10/10 Does this look like the CLT applies for sample standard deviations? Well Decently – but there are some issues like ___________ Poorly – because __________________ Task 4: Show your understanding of the CLT 1. Write a paragraph encompassing your understanding of the Central Limit Theorem and it’s utility. You may use R to supplement this if you choose. 2. Describe a scenario where geom(1/6) is an appropriate distribution. Then write and solve a problem that includes for this distribution. Then write and solve a problem about sample averages of the same scenario, which includes . P (X < 4)="" p="" (="">< 4)x¯ ¯¯̄ 4)x¯=""> 4)x¯ ¯¯̄>