Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 0.85 0.8 0.75 0.7 0.65 On training data On validation data- On...


1


Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been<br>pruned back to the red line.<br>0.9<br>0.85<br>0.8<br>0.75<br>0.7<br>0.65<br>On training data<br>On validation data-<br>On validation data (during pruning)<br>0.6<br>0.55<br>0.5<br>10<br>20<br>30<br>40<br>50<br>60<br>70<br>80<br>90<br>100<br>Size of tree (number of nodes)<br>Figure 2: Pruned decision tree<br>Refer to Figure 2.<br>Let's say that we have a third dataset Dnew (from the same data<br>distribution), which is not used for training or pruning.<br>If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25<br>nodes, and why? Select one.<br>Select one:<br>Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes)<br>Around 0.73 (the same as the accuracy for validation data at 25 nodes)<br>Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes)<br>None of the above<br>Which of the following gives us the best approximation of the true error?<br>Line corresponding to training data<br>Line corresponding to validation data<br>Line corresponding to new dataset Dnew<br>Which of the following are valid ways to avoid overfitting? Select all that apply.<br>Select all that apply:<br>O Decrease the training set size.<br>O Set a threshold for a minimum number of samples required to split at an internal node.<br>O Prune the tree so that cross-validation error is minimal.<br>O Maximize the tree depth.<br>O None of the above.<br>

Extracted text: Consider the graph below analyzing the size of tree vs. accuracy for a decision tree which has been pruned back to the red line. 0.9 0.85 0.8 0.75 0.7 0.65 On training data On validation data- On validation data (during pruning) 0.6 0.55 0.5 10 20 30 40 50 60 70 80 90 100 Size of tree (number of nodes) Figure 2: Pruned decision tree Refer to Figure 2. Let's say that we have a third dataset Dnew (from the same data distribution), which is not used for training or pruning. If we evaluate this new dataset, approximately what is the accuracy when the size of the tree is at 25 nodes, and why? Select one. Select one: Around 0.76 (slightly higher than the accuracy for validation data at 25 nodes) Around 0.73 (the same as the accuracy for validation data at 25 nodes) Around 0.70 (slightly lower than the accuracy for validation data at 25 nodes) None of the above Which of the following gives us the best approximation of the true error? Line corresponding to training data Line corresponding to validation data Line corresponding to new dataset Dnew Which of the following are valid ways to avoid overfitting? Select all that apply. Select all that apply: O Decrease the training set size. O Set a threshold for a minimum number of samples required to split at an internal node. O Prune the tree so that cross-validation error is minimal. O Maximize the tree depth. O None of the above.

Jun 10, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here