Part 1 (100 points) Analyze the data in the CreditCard dataset in AER package. (Note that you have to install AER package and any other additional package that are required by AER) The following...

1 answer below »







Part 1
(100 points)


Analyze the data in the
CreditCard
dataset in AER package. (Note that you have to install AER package and any other additional package that are required by AER)


The following variables are included in the dataset:


1. card: was the application for a card accepted? (Binary: 1/0) Response Variable


2. reports: Number of major derogatory reports


3. income: Yearly income (in USD 10,000)


4. Age: Age in years plus 12ths of a year


5. Owner: Does the individual own his/her home?


6. dependents: number of dependents


7. months: Months living at current address


8. share: ratio of monthly credit card expenditure to yearly income


9. selfemp: Is the individual self-employed?


10. majorcards: number of major credit cards held


11. active: number of active credit accounts


12. expenditure: average monthly credit card expenditure






Use variables 2 to 8 to determine which of the predictors influence the probability that an application is accepted. Online
Quiz 3B
will be based on your analysis below:



  1. Provide summary stat of the predictors. (5 points)







  1. There are some values of variable age under one year. Consider data with age>18 for your analysis for the rest of the questions. (5 points)







  1. Plot of income vs. reports (Number of major derogatory reports): mark individuals with card application accepted as blue, and not accepted as red. (5 points)






The online Quiz will be based on your interpretation of the plot.







  1. Boxplots of income as a function of card acceptance status. Boxplots of reports as a function of card acceptance status (mark card application accepted as blue, and not accepted as red). (Display two boxplots in same page). (10 points)







  1. Construct the histogram for the predictors. (5 points)


Note that share is highly right-skewed, so log(share) will be used in the analysis. reports is also extremely right skewed (most values of reports are 0 or 1, but the maximum value is 14. To reduce the skewness, log(reports+1) will be used for your analysis. Highly skewed predictors have high leverage points and are less likely to be linearly related to the response.







  1. Use variables 2 to 8 to determine which of the predictors influence the probability that an application is accepted. Use the summary function to print the results. (10 points)


Online Quiz will be based on the following and related concepts:




    1. Do any of the predictors appear to be statistically significant? If so, which ones? Explain how each of the significant predictors influences the response variable.








  1. To predict whether the application will be accepted or not, convert the predicted probabilities into class labels yes with the following condition: probs >.5="yes". Compute the confusion matrix and overall fraction of correct predictions. (30 points)






Online Quiz will be based on the following and related questions:

Answered 1 days AfterJul 16, 2021

Answer To: Part 1 (100 points) Analyze the data in the CreditCard dataset in AER package. (Note that you have...

Mohd answered on Jul 17 2021
155 Votes
Duper
Duper
-
7/17/2021
library(magrittr)
library(dplyr)
library(ggplot2)
library(GGally)
library(skimr)
library(readr)
creditcard <- read_csv("creditcard
.csv")
1. Provide summary stat of the predictors. Summary_statistics
skim(creditcard)
Data summary
    Name
    creditcard
    Number of rows
    1319
    Number of columns
    12
    _______________________
    
    Column type frequency:
    
    character
    3
    numeric
    9
    ________________________
    
    Group variables
    None
Variable type: character
    skim_variable
    n_missing
    complete_rate
    min
    max
    empty
    n_unique
    whitespace
    card
    0
    1
    2
    3
    0
    2
    0
    owner
    0
    1
    2
    3
    0
    2
    0
    selfemp
    0
    1
    2
    3
    0
    2
    0
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    reports
    0
    1
    0.46
    1.35
    0.00
    0.00
    0.00
    0.00
    14.00
    ▇▁▁▁▁
    age
    0
    1
    33.21
    10.14
    0.17
    25.42
    31.25
    39.42
    83.50
    ▁▇▅▁▁
    income
    0
    1
    3.37
    1.69
    0.21
    2.24
    2.90
    4.00
    13.50
    ▇▇▁▁▁
    share
    0
    1
    0.07
    0.09
    0.00
    0.00
    0.04
    0.09
    0.91
    ▇▁▁▁▁
    expenditure
    0
    1
    185.06
    272.22
    0.00
    4.58
    101.30
    249.04
    3099.50
    ▇▁▁▁▁
    dependents
    0
    1
    0.99
    1.25
    0.00
    0.00
    1.00
    2.00
    6.00
    ▇▂▁▁▁
    months
    0
    1
    55.27
    66.27
    0.00
    12.00
    30.00
    72.00
    540.00
    ▇▁▁▁▁
    majorcards
    0
    1
    0.82
    0.39
    0.00
    1.00
    1.00
    1.00
    1.00
    ▂▁▁▁▇
    active
    0
    1
    7.00
    6.31
    0.00
    2.00
    6.00
    11.00
    46.00
    ▇▃▁▁▁
1. There are some values of variable age under one year. Consider data with age>18 for your analysis for the rest of the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here