In exercise 3 and 4, we see that despite the fact that x and y are completely independent, we were able to predict y with accuracy higher than 70%. We must be doing something wrong then. What is it?
A. The function train estimates accuracy on the same data it uses to train the algorithm.
B. We are over-fitting the model by including 100 predictors.
C. We used the entire dataset to select the columns used in the model. This step needs to be included as part of the algorithm. The cross-validation was done after this selection.
D. The high accuracy is just due to random variability.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here