Assume (X, Y ) ∈R× [−L, L] a.s. Let F be a set of functions f : R→R and assume that F is a subset of a linear vector space of dimension K. Let mn be the least squares estimate
mn(·) = arg min f∈F 1 n n i=1 |f(Xi) − Yi| 2
and set
m∗ n(·) = arg min f∈F 1 n n i=1 |f(Xi) − m(Xi)| 2 .
Show that, for all δ > 0,
P 1 n n i=1 |mn(Xi) − m(Xi)| 2 > 2δ + 18 min f∈F 1 n n i=1 |f(Xi) − m(Xi)| 2 Xn 1
≤ P δ <>
≤ 1 n n i=1 (mn(Xi) − m∗ n(Xi)) · (Yi − m(Xi)) Xn 1 .
Use the peeling technique and Theorem 19.1 to show that, for δ ≥ c · K n , the last probability is bounded by
c exp(−nδ/c )
and use this result to derive a rate of convergence result for
1 n n i=1 |mn(Xi) − m(Xi)| 2 .