Illustrate the bias-variance decomposition and the bias-variance dilemma for regression through simulations. Let the target function be F(x) = x2 with Gaussian noise of variance 0.1. First, randomly...




Illustrate the bias-variance decomposition and the bias-variance dilemma for regression through simulations. Let the target function be F(x) = x2 with Gaussian noise of variance 0.1. First, randomly generate 100 data sets, each of size n = 10, by selecting a value of x uniformly in the range −1 ≤ x ≤ 1 and then applying F(x) with noise. Train any free parameters ai (by minimum square error criterion) in each of the regression functions in parts (a) – (d), one data set at a time. Then make a histogram of the sum-square error of Eq. 11 (cf. Fig. 9.4). For each model use your results to estimate the bias and the variance.


(a) g(x)=0.5


(b) g(x)=1.0


(c) g(x) = a0 + a1x


(d) g(x) = a0 + a1x + a2x2 + a3x3


(e) Repeat parts (a) – (d) for 100 data sets of size n = 100.


(f) Summarize all your above results, with special consideration of the bias-variance decomposition and dilemma, and the effect of the size of the data set.





Figure 9.4: The bias-variance dilemma can be illustrated in the domain of regression. Each column represents a different model, each row a different set of n = 6training points, Di, randomly sampled from the true function F(x) with noise. Histograms of the mean-square error of E ≡ ED[(g(x) − F(x))2] of Eq. 11 are shown at the bottom. Column a) shows a very poor model: a linear g(x) whose parameters are held fixed, independent of the training data. This model has high bias and zero variance. Column b) shows a somewhat better model, though it too is held fixed, independent of the training data. It has a lower bias than in a) and the same zero variance. Column c) shows a cubic model, where the parameters are trained to best fit the training samples in a mean-square error sense. This model has low bias, and a moderate variance. Column d) shows a linear model that is adjusted to fit each training set; this model has intermediate bias and variance. If these models were instead trained with a very large number n → ∞ of points, the bias in c) would approach a small value (which depends upon the noise), while the bias in d) would not; the variance of all models would approach zero. known. Furthermore, a large amount of training data will yield improved performance so long as the model is sufficiently general to represent the target function. These considerations of bias and variance help to clarify the reasons we seek to have as much accurate prior information about the form of the solution, and as large a training set as feasible; the match of the algorithm to the problem is crucial.



Jun 05, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here