This exercise will compare alternative measures of accuracy from randomForest() runs. First, 16 rows where data (on V6) is missing will be omitted:
(a) Compare repeated randomForest() runs:
(b) Compare OOB accuracies with test set accuracies:
Plot test set accuracies against OOB accuracies. Add the line y = x to the plot. Is there any consistent difference in the accuracies? Given a random training/test split, is there any reason to expect a consistent difference between OOB accuracy and test accuracy?
(c) Calculate the error rate for the training data:
Explain why use of the training data for testing leads to an error rate that is zero.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here