Introduction For this week’s take-home lab, you will work on the same data set from Week 4 Take-Home Lab. You will solve the very same problem studied in this week’s in-class lab on a much larger and...

1 answer below »

Introduction


For this week’s take-home lab, you will work on the same data set from Week 4 Take-Home Lab. You will solve the very same problem studied in this week’s in-class lab on a much larger and more interesting dataset. The data contained in the file UCI_Credit_Card.csv contains 30,000 consumer records with 24 different variables. You can read a detailed description of the different fields at the following website:https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clientsThe description from the UCI says marriage should have levels: Marital status (1 = married; 2 = single; 3 = others) However, there are levels (0,1,2,3). You should treat 0 as unknown. the description from the UCI says Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). However, there are levels 1 to 6 for education. Thus here 5 = 6 = unknow. X6-X11: The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. However, there are many factors that are -2. This is also unknown. So every unknown you should treat them as NA.


Your task is to build the best possible model for predicting whether or not a consumer will default on their credit card payment for the next month (the last column in the dataset).


Assignment


Perform the following tasks:




  • Conduct a training/test split of the data, building a 20% held out test dataset




  • Fit the best SVM model you can (consider feature selection etc.) to the data to predict consumer default.




  • Then plot ROC curves for the logistic regression and SVM and compare their performance.




  • Compute the AUC for the logistic regression and SVM and compare their performance.




  • Provide a summary and discussion of your work in written form (.docx or .pdf) that includes the following:




    • Q1 Summarize the model/feature selection process you used to fit your SVM model




    • Q2 Provide a summary of the fitted SVM model (i.e.model summary)




    • Q3 Provide performance evaluation of the fitted SVM model using confusion matrix.




    • Q4 How well do you think the fitted SVM model to this dataset works?




    • Q5 Using ROC curves and AUC, which one of logistic regression and SVM works better with the dataset so far?






Submission Instructions


For this weekly lab assignment, you should submit:




  • An R script file (or Rmd file)




  • A written summary/discussion of your work (as discussed above) in .docx



Answered 3 days AfterFeb 10, 2022

Answer To: Introduction For this week’s take-home lab, you will work on the same data set from Week 4 Take-Home...

Santosh Vasant answered on Feb 13 2022
130 Votes
Q1. Summarize the model/ feature selection process used to fit your SVM model
Correlation matrix is used to identi
fy highly correlated features. Features with high correlation coefficient (>0.85) are removed. Initially there were 24 features, 6 out of which are found to be correlated. To build model 18 features were considered.
The continuous features such as Limit balance, age, BILL AMT*, PAY AMT*, were scaled using scaler function of dplyr library. Categorical features like, age, marriage, education were kept as it is.
Following list of input features were considered:
LIMIT BAL,
AGE,
BILL AMT6,
PAY_AMT1,2,3,4,5,6
SEX,
EDUCATION,
MARRIAGE,
PAY_0,2,3,4,5,6
The correlation plot is shown in Figure 1.
Figure 1 : Correlation plot between all the features as well as target varaible
The data was split into training set and test set in the ratio of 8:2. Later 10 fold cross validation with 2 repeats was used. Hyper parameters were trained using train function of caret with two levels of model parameter values.
Q2. Provide a summary of a fitted SVM model.
Initially, problem is solved as regression, however, as target variable is...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here