Hi,
I am a working professional taking an online part time course in data science.I have got a project due on Saturday morning Singapore time. It is about CART model and decision tree.
Working code is already provided;however I am running into RStudio issues (the console keeps asking to terminate itself when I try to build the model). It is a small project which normally takes an hour or two to complete, and I've done similar assignments before but the issue in Rstudio never happened like this time.I've attached the error message and my working code before the message showed up for yourreference.
Will appreciate any expert advice and help on this.
Project -4 Personal Loan Campaign Problem Business Scenario • The data provided is from a Personal Loans Campaign executed by MyBank. • 20000 customers were targeted with an offer of Personal Loans at 10% interest rate. • 2512 customers out of 20000 responded expressing their need for Personal Loan; These customers are labelled as Target = 1 and remaining customers are labelled as Target = 0 Data dictionary Column Name Description CUST_ID Customer ID - Unique ID TARGET Target Field - 1: Responder, 0: Non- Responder AGE Age of the customer in years GENDER Gender BALANCE Average Monthly Balance OCCUPATION Occupation AGE_BKT Age Bucket SCR Generic Marketing Score HOLDING_PERIOD Ability to hold money in the account (Range 0 - 31) ACC_TYPE Account Type - Saving / Current ACC_OP_DATE Account Open Date LEN_OF_RLTN_IN_MNT Length of Relationship in Months H NO_OF_L_CR_TXNS No. of Credit Transactions NO_OF_L_DR_TXNS No. of Debit Transactions TOT_NO_OF_L_TXNS Total No. of Transaction NO_OF_BR_CSH_WDL_D No. of Branch Cash Withdrawal Transactions R_TXNS NO_OF_ATM_DR_TXNS No. of ATM Debit Transactions NO_OF_NET_DR_TXNS No. of Net Debit Transactions NO_OF_MOB_DR_TXNS No. of Mobile Banking Debit Transactions Column Name Description FLG_HAS_CC Has Credit Card - 1: Yes, 0: No AMT_ATM_DR Amount Withdrawn from ATM AMT_BR_CSH_WDL_DR Amount cash withdrawn from Branch AMT_CHQ_DR Amount debited by Cheque Transactions AMT_NET_DR Amount debited by Net Transactions AMT_MOB_DR Amount debited by Mobile Banking Transactions AMT_L_DR Total Amount Debited FLG_HAS_ANY_CHGS Has any banking charges AMT_OTH_BK_ATM_US Amount charged by way of the Other Bank G_CHGS ATM usage AMT_MIN_BAL_NMC_C Amount charged by way Minimum Balance HGS not maintained NO_OF_IW_CHQ_BNC_T Amount charged by way Inward Cheque XNS Bounce NO_OF_OW_CHQ_BNC_ Amount charged by way Outward Cheque TXNS Bounce AVG_AMT_PER_ATM_TX Avg. Amt withdrawn per ATM Transaction N AVG_AMT_PER_CSH_W Avg. Amt withdrawn per Cash Withdrawal DL_TXN Transaction AVG_AMT_PER_CHQ_TX Avg. Amt debited per Cheque Transaction N AVG_AMT_PER_NET_TX Avg. Amt debited per Net Transaction N AVG_AMT_PER_MOB_T Avg. Amt debited per Mobile Banking Part 1 - Classification Tree • Split data into Development (70%) and Hold-out (30%) Sample • Build Classification Tree using CART technique • Do necessary pruning • Measure Model Performance on Development Sample • Test Model Performance on Hold Out Sample • Ensure the model is not an overfit model Part 2 - Random Forest • Split data into Development (70%) and Hold-out (30%) Sample • Build Model using Random Forest technique • Measure Model Performance on Development Sample • Test Model Performance on Hold Out Sample • Ensure the model is not an overfit model • Compare the 2 Models’ Performance – CART – Random Forest • Ensemble Model – Create Ensemble Model based on the output of the above 3 models • Compare the Ensemble Model performance with individual model. --- title: "Bank_Personal_Loan" author: "Neha Tyagi" date: "June 14, 2019" output: word_document: default pdf_document: default html_document: df_print: paged --- ## Bank_Personal_Loan_Modelling Context: bank (Thera Bank) which has a growing customer base. Liability customers (depositors) with varying size of deposits - majority Asset customers (borrowers) - small Objective: Convert liability customers to personal loan customers (while retaining them as depositors). Task: Build a model identifying the potential customers who have higher probability of purchasing the loan. This will increase the success ratio while at the same time reduce the cost of the campaign. Historical Data: A campaign from last year- liability customers showed a healthy conversion rate of over 9% success. Data on 5000 customers including customer demographic information (age, income, etc.), customer's relationship with the bank (mortgage, securities account, etc.) customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign. # Understanding the attributes - Find relationship between different attributes (Independent variables) and choose carefully which all attributes have to be a part of the analysis and why # Some Charts and Graphs to show case the relationship between Independent and Dependent Variables # Exploratory Data Analysis # Splitting data in Train and Test dataset # Model Development (Any one of the below techniques to be used) # o Random Forest # o CART # Model Performance Measures # Validation of Model # Model Performance on Hold Out Sample # STEP1: IMPORT AND PRERARE DATA ```{r} library(readxl) data <- read_excel("c:/users/ntyagi/google="" drive/comscorepersonal/greatlearning/r/data/thera="" bank_personal_loan_modelling-dataset.xlsx",="" sheet="2," col_name="TRUE)" #="" data="" exploration="" str(data)="" ```="" ```{r}="" #="" rename="" columns="" #="" method="" 1:="" using="" dplyr="" library(dplyr)="" names(data)="" thera.data="">-><- data="" %="">% dplyr::rename( Income = `Income (in K/month)` , Age = `Age (in years)`, Experience = `Experience (in years)`, ZIP.Code = `ZIP Code`, Family.members = `Family members`, Personal.Loan = `Personal Loan` , CD.Account = `CD Account`, Securities.Account = `Securities Account`, Credit.Card = `CreditCard` ) names(thera.data) # Method 2 names(thera.data)[names(thera.data)=="Age..in.years."] <- "age"="" names(thera.data)[names(thera.data)="="Experience..in.years."]">-><- "experience"="" names(thera.data)[names(thera.data)="="Income..in.K.month."]">-><- "income"="" str(thera.data)="" ```="" #="" observation="" 1.="" column="" id="" is="" nominal="" data="" that="" is="" it="" is="" being="" used="" to="" label="" variables="" without="" providing="" any="" quantitative="" value.="" it="" cannot="" be="" ordered="" or="" measured.="" this="" could="" be="" removed="" 2.="" experience..in.years.="" is="" having="" negative="" experience.="" this="" need="" to="" be="" fixed.="" 3.="" data="" types="" like="" personal.loan,="" family.members="" ,="" education,="" are="" integer.="" these="" can="" be="" converted="" into="" factor="" since="" they="" have="" various="" levels="" to="" them.="" #="" convert="" into="" factors="" ```{r}="" thera.data$family.members="">-><- as.factor(thera.data$family.members)="" thera.data$personal.loan="">-><- as.factor(thera.data$personal.loan)="" thera.data$education="">-><- as.factor(thera.data$education)="" thera.data$securities.account="">-><- as.factor(thera.data$securities.account)="" thera.data$cd.account="">-><- as.factor(thera.data$cd.account)="" thera.data$online="">-><- as.factor(thera.data$online)="" thera.data$creditcard="">-><- as.factor(thera.data$credit.card)="" ```="" #="" remove="" column="" id="" and="" check="" for="" missing="" values="" ```{r}="" thera.data="">-><- subset(thera.data,="" select="c(-ID))" any(is.na(thera.data))="" view(thera.data)="" ```="" #="" remove="" records="" with="" missing="" values="" and="" negative="" experience="" values="" ```{r}="" thera.data="">-><- na.omit(thera.data)="" library(dplyr)="" thera.data="">-><- filter(thera.data,="" experience="">=0) # Lets check summary again summary(thera.data) ``` # Check for Zero variance/Near Zero variance ```{r} # install.packages("caret") library(caret) nsv <- nearzerovar(thera.data,="" savemetrics="TRUE)" nsv="">-><- cbind("colno"="1:ncol(thera.data),nsv)" view(nsv)="" nearzerovar()="" ```="" #="" observe="" no="" zero="" variance.="" mortgage="" has="" near="" zero="" variance.="" #="" check="" %="" of="" personal="" loan="" ```{r}="" length(which(thera.data$personal.loan="="1"))/nrow(thera.data)" ```="" personal.loan="" rate="9.6%" #="" step2:="" data="" exploration="" ```{r}="" #="" data="" visulization="" attach(thera.data)="" library(ggplot2)="" #="" for="" continuous="" variables:="" density="" plot="" #="" method="" 1:="" density="" plot="" for="" each="" variable="" ggplot(thera.data,="" aes(x="Age))" +="" geom_density(aes(fill="Personal.Loan)," alpha="0.2)" +="" scale_color_manual(values="c("#868686FF"," "#efc000ff"))="" +="" scale_fill_manual(values="c("darkturquoise"," "lightcoral"))="" +="" xlim(10,80)="" #="" method="" 2:="" use="" ggloop="" to="" plot="" all="" at="" once="" install.packages("ggloop")="" library(ggloop)="" names(thera.data)="" ggloop(thera.data,="" aes_loop(x="c(Age," experience,="" income,="" ccavg,="" mortgage)))="" %l+%="" geom_density(aes(fill="thera.data$Personal.Loan)," alpha="0.3)" ```="" #="" observe="" note="" that="" the="" peak="" displays="" where="" the="" values="" are="" concentrated.="" 1.="" customers="" of="" age="" range="" 30-35="" years="" have="" positive="" response="" to="" campaign="" 2.="" customers="" with="" 5-12="" years="" of="" experience="" have="" more="" acceptance="" of="" loans="" 3.="" customers="" with="" income="" between="" $125k="" and="" $200k="" have="" more="" personal.loan="" 4.="" customers="" with="" average="" credit="" card="" spending="" of="" ~3.5k="" have="" more="" personal.loan="" 5.="" customers="" with="" no="" house="" mortgage="" tend="" to="" take="" personal="" loan="" ```{r}="" #="" for="" categorical="" variables:="" bar="" plot="" library('scales')="" #="" for="" labels="percent" ggloop(thera.data,="" aes_loop(x="c(Family.members," education,="" cd.account,="" online,="" creditcard,="" securities.account)))="" %l+%="" geom_bar(aes(fill="as.factor(Personal.Loan))," position="fill" )="" %l+%="" scale_y_continuous(labels="percent)" %l+%="" ylab("percent")="" ```="" ```{r}="" #="" view="" actual="" percentages="" from="" above="" graphs="" #="" family.members="" prop.table(table(family.members,personal.loan),1)*100="" #="" cd.account="" prop.table(table(cd.account,personal.loan),1)*100="" ```="" #="" observe="" 1.="" higher="" percentage="" of="" personal.loan="" customers="" have="" family.members="" 3="" or="" 4="" 2.="" customers="" with="" a="" cd="" account="" react="" positively="" towards="" the="" campaign="" 3.="" level="" 1="" in="" education="" responded="" least="" favorable="" towards="" the="" campaign="" 4.="" possession="" of="" credit="" card="" by="" customers="" doesn't="" influence="" on="" campaign's="" success="" 5.="" customers'="" online="" banking="" activities="" doesn't="" influence="" on="" campaign's="" success="" 6.="" having="" securities="" accounts="" has="" a="" slightly="" positive="" affect="" on="" the="" customer's="" decision="" to="" take="" the="" loan="" offered="" in="" the="" last="" campaign="" ```{r}="" #="" split="" into="" training="" and="" testing="" sets="" ##="" 75%="" of="" the="" sample="" size="" smp_size="">-><- floor(0.7="" *="" nrow(thera.data))="" smp_size="" ##="" set="" the="" seed="" to="" make="" your="" partition="" reproducible="" set.seed(123)="" train_ind="">-><- sample(seq_len(nrow(thera.data)),="" size="smp_size)" bankloan.dev="">-><- thera.data[train_ind,="" ]="" bankloan.val="">-><- thera.data[-train_ind, ] thera.data[-train_ind,="">- thera.data[-train_ind, ]>->->