Using default settings, t a decision tree to the training set predict the credit ratings ofcustomers using all of the other variables in the dataset.a. Report the resulting tree.b. Based on this output, predict the credit rating of a hypothetical \median customer",i.e., one with the attributes listed in Table 1, showing the steps involved.c. Produce the confusion matrix for predicting the credit rating from this tree on thetest set, and also report the overall accuracy rate.1d. What is the numerical value of the gain in entropy corresponding to the rst split atthe top of the tree? (Use logarithms to base 2, and show the details of the calculationrather than just providing a nal answer.)e. Fit a random forest model to the training set to try to improve prediction. Reportthe R output.f. Produce the confusion matrix for predicting the credit rating from this forest on thetest set, and also report the overall accuracy rate.3. Using default settings, for svm() from the e1071 package, t a support vector machine topredict the credit ratings of customers using all of the other variables in the dataset.a. Predict the credit rating of a hypothetical \median customer", i.e., one with theattributes listed in Table 1. Report decision values as well.b. Produce the confusion matrix for predicting the credit rating from this SVM on thetest set, and also report the overall accuracy rate.c. Automatically or manually tune the SVM to improve prediction over that foundin 3(b). Report the resulting SVM settings and the resulting confusion matrix forpredicting the test set. (Any amount of improvement is acceptable.)4. Fit the Naive Bayes model to predict the credit ratings of customers using all of the othervariables in the dataset.a. Predict the credit rating of a hypothetical \median customer", i.e., one with theattributes listed in Table 1. Report predicted probabilities as well.b. Produce the confusion matrix for predicting the credit rating using Naive Bayes onthe test set, and also report the overall accuracy rate.c. Reproduce the rst 20 or so lines of the R output for the Naive Bayes t, and usethem to explain how you would make this prediction.5. Based on the confusion matrices reported in the preceding parts,a. Which of the classiers look to be the best? (Be specic, and specify the gures youused to answer this question.)b. Which look to be the worst? (Be specic, and specify the gures you used to answerthis question.)c. Are there any categories that all classiers seem to have trouble with?6. Consider a simpler problem of predicting whether a customer gets a credit rating of Aor not.a. Fit a logistic regression model to predict the credit ratings of customers using all ofthe other variables in the dataset, with no interactions.b. Report the summary table of the logistic regression model t.c. Which predictors of credit rating appear to be signicant? Which of them are likelyto be spuriously so?d. Fit an SVM model of your choice to the training set.e. Produce an ROC chart comparing the logistic regression and the SVM results ofpredicting the test set. Comment on any dierences in their performance.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here