The U.S. Small Business Administration (SBA) was founded in 1953 on the principle of promoting and assisting small enterprises in the U.S. credit market. Small businesses have been a primary source of...

1 answer below »
The U.S. Small Business Administration (SBA) was founded in 1953 on the principle of promoting and assisting small enterprises in the U.S. credit market. Small businesses have been a primary source of job creation in the United States; therefore, fostering small business formation and growth has social benefits by creating job opportunities and reducing unemployment. One way SBA assists these small business enterprises is through a loan guarantee program which is designed to encourage banks to grant loans to small businesses. SBA acts much like an insurance provider to reduce the risk for a bank by taking on some of the risk through guaranteeing a portion of the loan. In the case that a loan goes into default, SBA then covers the amount they guaranteed. There have been many success stories of start-ups receiving SBA loan guarantees such as FedEx and Apple Computer. However, there have also been stories of small businesses and/or start-ups that have defaulted on their SBA-guaranteed loans. The rate of default on these loans has been a source of controversy for decades. Conservative economists believe that credit markets perform efficiently without government participation. Supporters of SBA-guaranteed loans argue that the social benefits of job creation by those small businesses receiving government-guaranteed loans far outweigh the costs incurred from defaulted loans. Since SBA loans only guarantee a portion of the entire loan balance, banks will incur some losses if a small business defaults on its SBA-guaranteed loan. Therefore, banks are still faced with a difficult choice as to whether they should grant such a loan because of the high risk of default. One way to inform their decision making is through analyzing relevant historical data such as the datasets provided here. The case study focuses on loans pertaining to theReal Estate and Rental and Leasingindustry in California. The relevant data is extracted from the National SBA file to create this file which has 2,102 observations and 35 variables.


(a) Applyk-NN, Naïve Bayes, and Classification Trees (use GridsearchCV on training data coupled with cross-validation) to classify a loan application as a “lower risk” (approve) or “higher risk” (deny), using appropriate predictors. Partition the data into training (60%) and validation (40%) sets. Normalize data where it’s appropriate. Find the bestkfork-NN. Report classification accuracy rate for both training and validation data. Produce the lift and gains charts for all classifiers.


(b) Repartition the data into training, validation, and test sets (50%:30%:20%). Apply thek-NN classifier with thekchosen using the validation set. Compare the confusion matrix of the test set with that of the training and validation sets.

Answered 1 days AfterMay 19, 2021

Answer To: The U.S. Small Business Administration (SBA) was founded in 1953 on the principle of promoting and...

Suraj answered on May 20 2021
151 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.metrics import accuracy_score\n",
"from sklearn.metrics import confusion_matrix\n",
"from sklearn.model_selection import cross_val_score\n",
"from sklearn.model_selection import GridSearchCV"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
SelectedLoanNr_ChkDgtNameCityStateZipBankBankStateNAICSApprovalDate...ChgOffPrinGrGrAppvSBA_AppvNewRealEstatePortionRecessiondaystermxxDefault
001004285007SIMPLEX OFFICE SOLUTIONSANAHEIMCA92801CALIFORNIA BANK & TRUSTCA53242015074...03000015000000.50108016175.00
111004535010DREAM HOME REALTYTORRANCECA90505CALIFORNIA BANK & TRUSTCA53121015130...03000015000000.51168017658.00
201005005006Winset, Inc. dba Bankers HillSAN DIEGOCA92103CALIFORNIA BANK & TRUSTCA53121015188...03000015000000.50108016298.00
311005535001Shiva ManagementSAN DIEGOCA92108CALIFORNIA BANK & TRUSTCA53131215719...05000025000000.50108016816.00
411005996006GOLD CROWN HOME LOANS, INCLOS ANGELESCA91345SBA - EDF ENFORCEMENT ACTIONCO53139016840...0343000343000011.00720024103.00
\n",
"

5 rows × 35 columns

\n",
"
"
],
"text/plain": [
" Selected LoanNr_ChkDgt Name City State \\\n",
"0 0 1004285007 SIMPLEX OFFICE SOLUTIONS ANAHEIM CA \n",
"1 1 1004535010 DREAM HOME REALTY TORRANCE CA \n",
"2 0 1005005006 Winset, Inc. dba Bankers Hill SAN DIEGO CA \n",
"3 1 1005535001 Shiva Management SAN DIEGO CA \n",
"4 1 1005996006 GOLD CROWN HOME LOANS, INC LOS ANGELES CA \n",
"\n",
" Zip Bank BankState NAICS ApprovalDate ... \\\n",
"0 92801 CALIFORNIA BANK & TRUST CA 532420 15074 ... \n",
"1 90505 CALIFORNIA BANK & TRUST CA 531210 15130 ... \n",
"2 92103 CALIFORNIA BANK & TRUST CA 531210 15188 ... \n",
"3 92108 CALIFORNIA BANK & TRUST CA 531312 15719 ... \n",
"4 91345 SBA - EDF ENFORCEMENT ACTION CO 531390 16840 ... \n",
"\n",
" ChgOffPrinGr GrAppv SBA_Appv New RealEstate Portion Recession \\\n",
"0 0 30000 15000 0 0 0.5 0 \n",
"1 0 30000 15000 0 0 0.5 1 \n",
"2 0 30000 15000 0 0 0.5 0 \n",
"3 0 50000 25000 0 0 0.5 0 \n",
"4 0 343000 343000 0 1 1.0 0 \n",
"\n",
" daysterm xx Default \n",
"0 1080 16175.0 0 \n",
"1 1680 17658.0 0 \n",
"2 1080 16298.0 0 \n",
"3 1080 16816.0 0 \n",
"4 7200 24103.0 0 \n",
"\n",
"[5 rows x 35 columns]"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df=pd.read_excel(\"C:/Users/Hp/Desktop/data.xlsx\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [],
"source": [
"df.shape\n",
"df=df.drop([\"LoanNr_ChkDgt\",\"Name\",\"City\",\"State\",\"Zip\",\"Bank\",\"BankState\"],axis=1)\n",
"df=df.drop([\"RevLineCr\",\"LowDoc\",\"ApprovalDate\",\"ApprovalFY\",\"DisbursementGross\",\"DisbursementDate\",\"NAICS\"],axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
SelectedTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralChgOffDateBalanceGross...ChgOffPrinGrGrAppvSBA_AppvNewRealEstatePortionRecessiondaystermxxDefault
003611.00010NaN0...03000015000000.50108016175.00
115611.00010NaN0...03000015000000.51168017658.00
2036101.00010NaN0...03000015000000.50108016298.00
313661.00010NaN0...05000025000000.50108016816.00
41240651.036511NaN0...0343000343000011.00720024103.00
\n",
"

5 rows × 21 columns

\n",
"
"
],
"text/plain": [
" Selected Term NoEmp NewExist CreateJob RetainedJob FranchiseCode \\\n",
"0 0 36 1 1.0 0 0 1 \n",
"1 1 56 1 1.0 0 0 1 \n",
"2 0 36 10 1.0 0 0 1 \n",
"3 1 36 6 1.0 0 0 1 \n",
"4 1 240 65 1.0 3 65 1 \n",
"\n",
" UrbanRural ChgOffDate BalanceGross ... ChgOffPrinGr GrAppv SBA_Appv \\\n",
"0 0 NaN 0 ... 0 30000 15000 \n",
"1 0 NaN 0 ... 0 30000 15000 \n",
"2 0 NaN 0 ... 0 30000 15000 \n",
"3 0 NaN 0 ... 0 50000 25000 \n",
"4 1 NaN 0 ... 0 343000 343000 \n",
"\n",
" New RealEstate Portion Recession daysterm xx Default \n",
"0 0 0 0.5 0 1080 16175.0 0 \n",
"1 0 0 0.5 1 1680 17658.0 0 \n",
"2 0 0 0.5 0 1080 16298.0 0 \n",
"3 0 0 0.5 0 1080 16816.0 0 \n",
"4 0 1 1.0 0 7200 24103.0 0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [],
"source": [
"df[\"xx\"].fillna(np.mean(df[\"xx\"]), inplace = True)\n",
"#df[\"DisbursementDate\"].fillna(np.mean(df[\"DisbursementDate\"]), inplace = True)\n",
"df[\"NewExist\"].fillna(np.mean(df[\"NewExist\"]), inplace = True)\n",
"df[\"ChgOffDate\"].fillna(np.mean(df[\"ChgOffDate\"]), inplace = True)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 2102 entries, 0 to 2101\n",
"Data columns (total 21 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Selected 2102 non-null int64 \n",
" 1 Term 2102 non-null int64 \n",
" 2 NoEmp 2102 non-null int64 \n",
" 3 NewExist 2102 non-null float64\n",
" 4 CreateJob 2102 non-null int64 \n",
" 5 RetainedJob 2102 non-null int64 \n",
" 6 FranchiseCode 2102 non-null int64 \n",
" 7 UrbanRural 2102 non-null int64 \n",
" 8 ChgOffDate 2102 non-null float64\n",
" 9 BalanceGross 2102 non-null int64 \n",
" 10 MIS_Status 2102 non-null object \n",
" 11 ChgOffPrinGr 2102 non-null int64 \n",
" 12 GrAppv 2102 non-null int64 \n",
" 13 SBA_Appv 2102 non-null int64 \n",
" 14 New 2102 non-null int64 \n",
" 15 RealEstate 2102 non-null int64 \n",
" 16 Portion 2102 non-null float64\n",
" 17 Recession 2102 non-null int64 \n",
" 18 daysterm 2102 non-null int64 \n",
" 19 xx 2102 non-null float64\n",
" 20 Default 2102 non-null int64 \n",
"dtypes: float64(4), int64(16), object(1)\n",
"memory usage: 345.0+ KB\n"
]
}
],
"source": [
"df.isnull().values.any()\n",
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"G:\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
}
],
"source": [
"from sklearn.preprocessing import LabelEncoder\n",
"var=LabelEncoder()\n",
"df[[\"MIS_Status\"]]=var.fit_transform(df[[\"MIS_Status\"]])"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [],
"source": [
"target=df[\"Selected\"]\n",
"ind_var=df.iloc[:,1:]"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"x_train,x_test,y_train,y_test=train_test_split(ind_var,target,train_size=0.60,random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [],
"source": [
"std_scale=StandardScaler()\n",
"x_train1=std_scale.fit_transform(x_train)\n",
"x_test1=std_scale.fit_transform(x_test)"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The optimal value of k is 4\n"
]
}
],
"source": [
"k_value=[i for i in range(1,10)]\n",
"k_score=[]\n",
"for k in k_value:\n",
" knn=KNeighborsClassifier(n_neighbors=k,n_jobs=1)\n",
" cv_score=cross_val_score(knn,x_train,y_train,cv=5,scoring=\"accuracy\")\n",
" k_score.append(cv_score.mean())\n",
"optimal_k=k_score.index(max(k_score))\n",
"print(\"The optimal value of k is %d\" %optimal_k)"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The accuracy for KNN model is 0.526754\n",
"confusion matrix is\n",
"[[307 118]\n",
" [280 136]]\n"
]
}
],
"source": [
"#KNN\n",
"knn=KNeighborsClassifier(n_neighbors=4)\n",
"knn.fit(x_train,y_train)\n",
"y_pred=knn.predict(x_test)\n",
"print(\"The accuracy for KNN model is %f\" % accuracy_score(y_test,y_pred))\n",
"scores=cross_val_score(knn,x_train,y_train,cv=5)\n",
"print(\"confusion matrix is\")\n",
"print(confusion_matrix(y_test,y_pred))"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"import scikitplot as skplt\n",
"y_pred=knn.predict_proba(x_test)\n",
"skplt.metrics.plot_cumulative_gain(y_test, y_pred)\n",
"plt.title(\"Gain chart for KNN\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here