Assignment 1: Simulations
In this first assignment, I want you to analyze the properties of a few estimators. I want you to create a Rmd
file, compile it and upload both the Rmd and PDF files before Sunday October 3 at 10:00pm. I usually do
not answer emails during the weekend, so do not start at the last minute in case you need help. You are
allowed to discuss with each other, but I expect each copy to be unique.
If some simulations are too long to compile, you can generate the solution and save it into an .Rda file and
load the file in your document to present the result. This will make it easier for you to compile the PDF. If
you choose to proceed this way, include the code for the simulation with the option eval=FALSE. Here is an
example. You first show the simulation code without running it (you can run it once on your side to get the
result):
Xvec numeric()
for (i in 1:10000)
{
x rnorm(5000)
Xvec[i] mean(x)
}
save(Xvec, file="A1Res.rda")
Then, you print the result:
load("A1Res.rda")
mu mean(Xvec)
rmse sqrt(mean(Xvecˆ2))
rmse
## [1] 0.01405928
If you use this method, upload all .rda files needed to compile the PDF.
Important: Before you start any simulation, set the seed to your student number. Also, set the number of
iterations to 2,000 for all your simulations.
Estimator of the covariance
We saw that the unbiased estimator of the variance, S2, is less efficient than ˆ 2 = (n − 1)S2/n. We also saw
in a simulation that the RMSE of ˆ 2 is smaller. In this question, we want to see if we get the same result for
the following estimators of the covariance:
Sxy = 1
n − 1
Xn
i=1
(xi − ¯x)(yi − ¯y)
xy = 1
n
Xn
i=1
(xi − ¯x)(yi − ¯y)
Set the sample size to n = 25 and try three different pairs of distributions for X and Y . For each pair, report
the RMSE of both estimators. Interpret your result.
1
Estimator of the mean for non-identically distributed samples.
We saw that ¯X is the best linear unbiased estimator of the mean, which means that it is has the smallest
variance among all estimators that can be defined as:
ˆμ =
Pn
Pi=1 wixi n
i=1 wi
However, this is true only for iid samples. In your simulation, consider a sample of 100 observations with
xi iid(0, 2
1) for i = 1, ..., 50 and xi iid(0, 2
2) for i = 51, ..., 100.
• In the first simulation, compare the bias, standard error and RMSE for ¯X and ˆμ using wi = 1/ 2 for
i = 1, ..., 50 and wi = 1/ 2
2 for i = 51, ..., 100. We also assume that 2
i ’s are known. You can try a few
distributions and see if it affects the properties.
• In the second simulation, we use the same weights but we assume that the 2
i ’s are unknown. Therefore,
we replace them by estimators. Since it is a different estimator, we call it ˜μ. Using the same realized
samples as in the previous simulation, compare the bias, standard error and RMSE of ˜μ and ¯x.
Do we see a difference when the true variances are replaced by estimates? Explain.
Asymptotic distribution
For this question, we want to analyze the large sample properties of the fourth moment estimator:
ˆm4 = 1
n
Xn
i=1
X4
i
To analyze the properties, use two distributions: one with the first 8 moments being finite and one with the
8th moment being either infinite or undetermined. Then:
• Compare the consistency of ˆm4 for the two distributions when n goes from 10 to 2000. Explain your
result.
• Compare the distributions of p
n( ˆm4 − m4) for n = 10, ..., 2000. Do they seem to be asymptotically
normal? Explain why.
If you want the true m4, you can get it from the web, or you can compute it numerically. Since the definition
is
R
f(x)x4dx, where f(x) is the density, we could get it in R using the integrate function. For example, the
fourth moment of the N(0,4) is:
f function(x) dnorm(x, 0, 2)*xˆ4
m4 integrate(f, -Inf, Inf)
m4
## 48 with absolute error
It is therefore 48.
The Delta Method
Suppose X > 0 and we want to show that:
p
n(¯X 3 − μ3) ! N(0, 2(3μ2)2)
First, use the Delta method to show that it does converge to N(0, 2(3μ2)2). Then, use a simulation to see if
the distribution gets closer and closer to this normal distribution when n increases. Interpret your result
(e.g. is it a good approximation in small samples?).
2
For your population, you can choose any distribution with μ 6= 0 that satisfies the CLT for ¯X , but you cannot
use the normal distribution. If you want, you can try more than one distribution and compare the results.
Theoretically, what happens if μ = 0? Can you very that numerically?
Hypothesis tests
We want to test the Null hypothesis H0 : μ = c against the alternative H1 : μ 6= c at . We know that when
the population is known to be normal, we have the following:
Test 1:
p
n(¯X − μ)
S
tn−1
If the distribution of the population is unknown, we rely on the asymptotic distribution:
Test 2:
p
n(¯X − μ)
ˆ
! N(0, 1)
where ˆ 2 = (n − 1)S2/n. Answer the following question:
• Suppose xi N(μ, 2). Using a simulation, compare the power curve of Test 1 and Test 2 for = 5%.
You can choose any μ and 2. Set the sample size to 25.
• Answer the same question when xi is not normal. Choose any distribution.
What happens when n=300? Explain.
Bootstrap
Here, we want to see how good is the Bootstrap method to estimate the standard deviation of
ˆm4 = 1
n
Xn
i=1
X4
i
For this simulation, assume that xi N(0, 1). Since we know all moments of the normal distribution, it
implies that V ar( ˆm4) = 96/n (prove it). Also, set the sample size of each sample to 20 and the number of
Bootstrap samples to 500.
• Estimate the bias of the Bootstrap estimator of the standard deviation.
• Using the 2,000 ˆm4 and 2,000 Bootstrap estimates of the standard deviation, estimate the probability
of rejecting H0 : m4 = 3 against H0 : m4 6= 3 using the asymptotic distribution when the size of the
test is = 1%, 5% and 10=%. The asymptotic distribution is:
p
n( ˆm4 − 3)
SD( ˆm4) ! N(0, 1)
3