A handy rule of thumb in statistics and life is as follows: Conditioning often makes things better. This problem explores how the above rule of thumb applies to estimating unknown parameters. Let θ be an unknown parameter that we wish to estimate based on data X
1, X
2,...,X
n
(these are r.v.s before being observed, and then after the experiment they “crystallize” into data). In this problem, θ is viewed as an unknown constant, and is not treated as an r.v. as in the Bayesian approach. Let T
1be an estimator for θ (this means that T
1is a function of X
1,...,X
nwhich is being used to estimate θ). A strategy for improving T
1(in some problems) is as follows. Suppose that we have an r.v. R such that T
2= E(T
1|R) is a function of X
1,...,X
n(in general, E(T
1|R) might involve unknowns such as θ but then it couldn’t be used as an estimator). Also suppose that P(T
1= T
2)
21) is finite.
(a) Use Jensen’s inequality to show that T2 is better than T1 in the sense that the mean squared error is less, i.e.,
(b) The bias of an estimator T for θ is defined to be b(T) = E(T)θ. An important identity in statistics, a form of the bias-variance tradeoff, is that mean squared error is variance plus squared bias:
Use this identity and Eve’s law to give an alternative proof of the result from (a).