MIS 245 IBM Predictive Modeler Project Spring 2020 Project Description For this project you will use Lending Club loan data. I have already cleaned up the data set for you. The ultimate goal of...



MIS 245



IBM Predictive Modeler Project


Spring 2020




Project Description


For this project you will use Lending Club loan data. I have already cleaned up the data set for you. The ultimate goal of the project is to identify whether a given customer will default on his loan or not. You need to run several machine learning algorithms to perform this task. Your main challenge is able to do the prediction with multiple models using SPSS modeler and compare their performance.


You are also welcomed to pick your own datasets but please come to me for approval before you start on your project. Extra points will be given for choosing your own problem.



Some facts about the data set



  1. The data consist of 140 features of almost 40000 different individual loan record from lending club database.

  2. The target variable is the loan status. ‘Charged off’ denote default and ‘Fully paid’ denotes not default.

  3. Although I already cleaned up the data, there are some features (variable) in the data that are either to messy to work with or probably not required for building your model. So, use your judgment before assigning these features as your input into the model.



You need to submit



  1. A report (word/PDF file) that is 5-7 pages (double space, including tables and figures) long. The report should include:


    • An Introduction: problem description and definition.

    • Data description.

    • Method

    • Result

    • Discussion



The project is tentatively due
late April/early May. Submit a hard copy to me before class.






Tips



  1. Run multiple models to find the one with the best performance. Notice that it is a classification problem (supervised learning), so make sure using the right models.

  2. Go back and adjust the selection of input variables. Select or Unselect some variables and see does this gives you a better performance.

  3. You should try at least two to three classification techniques. And report the best performing one.

  4. The performance evaluation and comparison should be discussed in full detail. You need to include the predictor importance result from the rule induction model and discuss it. Also, in the result part of your project, highlight the best accuracy you get and corresponding model settings from which you achieved that accuracy.



Things to note



  1. Use a type note or type function to assign variables roles (input, target, or none) to variables. Loan status is our target variable. For input and none, use your own judgment. We can select all the variables as input but which might not be optimal. There are some variables that bring in noise or are redundant. Try to use at least 15-20 variables as input.

  2. Describes the continuous variables you selected either as input. You should report mean, median, mode, and standard deviation. This should form your data description part.


3. Use Partition node to partition the data into train/test split (80/20). This is to test your model on unseen data.



Apr 30, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here