R programming and pythonPerform BIRCH clustering for the Loans data set. As a final step of this assignment, make a graph of clusters, compute silhouette (in addition, you can make a silhouette graph)...

1 answer below »
R programming and pythonPerform BIRCH clustering for theLoansdata set. As a final step of this assignment, make a graph of clusters, compute silhouette (in addition, you can make a silhouette graph) in both R and Python, and make conclusions. Make a final report with code, outputs, graphs, captions, and basic descriptions / conclusions.
Answered Same DayNov 02, 2020

Answer To: R programming and pythonPerform BIRCH clustering for the Loans data set. As a final step of this...

Aakarsh answered on Nov 04 2020
148 Votes
Notebook


Birch clustering using Python on Loan Data Set¶
In [437]:

#import pakages required
from itertools import cycle
from time import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import pandas as
pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import Birch
from sklearn.model_selection import train_test_split
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.cm as cm
from sklearn.metrics import silhouette_samples, silhouette_score
Overview Of the dataset¶
In [343]:

df = pd.read_csv("loan_data.csv") # load the data into a dataframe using Pandas
df.head()
Out[343]:
        Approval    Debt-to-Income Ratio    FICO Score    Request Amount    Interest
    0    F    0.0    397    1000    450.0
    1    F    0.0    403    500    225.0
    2    F    0.0    408    1000    450.0
    3    F    0.0    408    2000    900.0
    4    F    0.0    411    5000    2250.0
In [344]:

df.info() #Different tuples in the data set

RangeIndex: 150302 entries, 0 to 150301
Data columns (total 5 columns):
Approval 150302 non-null object
Debt-to-Income Ratio 150302 non-null float64
FICO Score 150302 non-null int64
Request Amount 150302 non-null int64
Interest 150302 non-null float64
dtypes: float64(2), int64(2), object(1)
memory usage: 5.7+ MB
Around 150k enteries for the loan data
In [345]:

# Replacing true false with 1 and 0 respectively
df["Approval"].replace('F', 0, inplace=True)
df["Approval"].replace('T', 1, inplace=True)
In [346]:

df.describe() #statistical details of the data
Out[346]:
        Approval    Debt-to-Income Ratio    FICO Score    Request Amount    Interest
    count    150302.000000    150302.000000    150302.000000    150302.000000    150302.000000
    mean    0.500566    0.183538    672.023266    13427.080145    6042.186065
    std    0.500001    0.137226    69.129157    9468.345958    4260.755681
    min    0.000000    0.000000    371.000000    500.000000    225.000000
    25%    0.000000    0.090000    647.000000    6000.000000    2700.000000
    50%    1.000000    0.160000    684.000000    11000.000000    4950.000000
    75%    1.000000    0.240000    714.000000    19000.000000    8550.000000
    max    1.000000    1.030000    869.000000    44000.000000    19800.000000
Here details of all loan parameters are depicted and Average Approval rate is 50%
Correlation b/w tuples¶
In [439]:

corr = df.corr()
corr
Out[439]:
        Approval    Debt-to-Income Ratio    FICO Score    Request Amount    Interest
    Approval    1.000000    -0.267921    0.544305    -0.045903    -0.045903
    Debt-to-Income Ratio    -0.267921    1.000000    -0.070586    0.129207    0.129207
    FICO Score    0.544305    -0.070586    1.000000    0.153920    0.153920
    Request...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here