BackgroundJill commends you for all your hard work. Piece by piece, you’ve been building up your...

Question

BackgroundJill commends you for all your hard work. Piece by piece, you’ve been building up your skills in data preparation, statistical reasoning, and machine learning. You are now ready to apply machine learning to solve a real-world challenge: credit card risk.Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, you’ll need to employ different techniques to train and evaluate models with unbalanced classes. Jill asks you to useimbalanced-learnandscikit-learnlibraries to build and evaluate models using resampling.Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, you’ll oversample the data using theRandomOverSamplerandSMOTEalgorithms, and undersample the data using theClusterCentroidsalgorithm. Then, you’ll use a combinatorial approach of over- and undersampling using theSMOTEENNalgorithm. Next, you’ll compare two new machine learning models that reduce bias,BalancedRandomForestClassifierandEasyEnsembleClassifier, to predict credit risk. Once you’re done, you’ll evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.What You're CreatingThis new assignment consists of three technical analysis deliverables and a written report. You will submit the following:	Deliverable 1: Use Resampling Models to Predict Credit Risk	Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk	Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk	Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)FilesUse the following link to download theModule-17-Challenge-Resources.zip(Links to an external site.)file that includes theLoanStats_2019Q1.csvdataset and two starter code files:credit_risk_resampling_starter_code.ipynbandcredit_risk_ensemble_starter_code.ipynb.Before You StartCreate a new GitHub repository entitled "Credit_Risk_Analysis" and initialize the repository with a README.Deliverable 1: Use Resampling Models to Predict Credit Risk (30 points)Deliverable 1 InstructionsUsing your knowledge of theimbalanced-learnandscikit-learnlibraries, you’ll evaluate three machine learning models by using resampling to determine which is better at predicting credit risk. First, you’ll use the oversamplingRandomOverSamplerandSMOTEalgorithms, and then you’ll use the undersamplingClusterCentroidsalgorithm. Using these algorithms, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.REWINDFor this deliverable, you’ve already done the following in this module:						Lesson 17.2.3:		Split the data into training and testing sets						Lesson 17.3.1:		Perform logistic regression						Lesson 17.4.1:		Calculate accuracy, precision, and sensitivity						Lesson 17.4.2:		Create a confusion matrix						Lesson 17.10.1:		Use theRandomOverSamplerandSMOTEalgorithms to resample a dataset						Lesson 17.10.2:		Use theClusterCentroidsalgorithm to resample a datasetFollow the instructions below and use thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 1.Open thecredit_risk_resampling_starter_code.ipynbfile, rename itcredit_risk_resampling.ipynb, and save it to your Credit_Risk_Analysis folder.Using the information we’ve provided in the starter code, create your training and target variables by completing the following steps:	Create the training variables by converting the string values into numerical ones using theget_dummies()method.	Create the target variables.	Check the balance of the target variables.Next, begin resampling the training data. First, use the oversamplingRandomOverSamplerandSMOTEalgorithms to resample the data, then use the undersamplingClusterCentroidsalgorithm to resample the data. For each resampling algorithm, do the following:	Use theLogisticRegressionclassifier to make predictions and evaluate the model’s performance.	Calculate the accuracy score of the model.	Generate a confusion matrix.	Print out the imbalanced classification report.Save yourcredit_risk_resampling.ipynbfile to your Credit_Risk_Analysis folder.Deliverable 1 RequirementsYou will earn a perfect score for Deliverable 1 by completing all requirements below:	For all three algorithms, the following have been completed:			An accuracy score for the model is calculated(7.5 pt)						A confusion matrix has been generated(7.5 pt)						An imbalanced classification report has been generated(15 pt)						Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk (15 points)Deliverable 2 InstructionsUsing your knowledge of theimbalanced-learnandscikit-learnlibraries, you’ll use a combinatorial approach of over- and undersampling with theSMOTEENNalgorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms from Deliverable 1. Using theSMOTEENNalgorithm, you’ll resample the dataset, view the count of the target classes, train a logistic regression classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.REWINDFor this deliverable, you’ve already done the following in this module:						Lesson 17.3.1:		Perform logistic regression						Lesson 17.4.1:		Calculate accuracy, precision, and sensitivity						Lesson 17.4.2:		Create a confusion matrix						Lesson 17.10.3:		Use theSMOTEENNalgorithm to resample a datasetFollow the instructions below and use the information in thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 2.	Continue using yourcredit_risk_resampling.ipynbfile where you have already created your training and target variables.	Using the information we have provided in the starter code, resample the training data using theSMOTEENNalgorithm.	After the data is resampled, use theLogisticRegressionclassifier to make predictions and evaluate the model’s performance.	Calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.Save yourcredit_risk_resampling.ipynbfile to your Credit_Risk_Analysis folder.Deliverable 2 RequirementsYou will earn a perfect score for Deliverable 2 by completing all requirements below:	The combinatorialSMOTEENNalgorithm does the following:			An accuracy score for the model is calculated(5 pt)						A confusion matrix has been generated(5 pt)						An imbalanced classification report has been generated(5 pt)						Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk (25 points)Deliverable 3 InstructionsUsing your knowledge of theimblearn.ensemblelibrary, you’ll train and compare two different ensemble classifiers,BalancedRandomForestClassifierandEasyEnsembleClassifier, to predict credit risk and evaluate each model. Using both algorithms, you’ll resample the dataset, view the count of the target classes, train the ensemble classifier, calculate the balanced accuracy score, generate a confusion matrix, and generate a classification report.REWINDFor this deliverable, you’ve already done the following in this module:						Lesson 17.2.3:		Split the data into training and testing sets						Lesson 17.3.1:		Perform logistic regression						Lesson 17.4.1:		Calculate accuracy, precision, and sensitivity						Lesson 17.4.2:		Create a confusion matrix						Lesson 17.9.2:		Understand adaptive boostingFollow the instructions below and use the information in thecredit_risk_resampling_starter_code.ipynbfile to complete Deliverable 3.	Open thecredit_risk_ensemble_starter_code.ipynbfile, rename itcredit_risk_ensemble.ipynb, and save it to your Credit_Risk_Analysis folder.	Using the information we have provided in the starter code, create your training and target variables by completing the following:			Create the training variables by converting the string values into numerical ones using theget_dummies()method.			Create the target variables.			Check the balance of the target variables.				Resample the training data using theBalancedRandomForestClassifieralgorithm with 100 estimators.			Consult the followingRandom Forest documentation(Links to an external site.)for an example.				After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.	Print the feature importance sorted in descending order (from most to least important feature), along with the feature score.	Next, resample the training data using theEasyEnsembleClassifieralgorithm with 100 estimators.			Consult the followingEasy Ensemble documentation(Links to an external site.)for an example.				After the data is resampled, calculate the accuracy score of the model, generate a confusion matrix, and then print out the imbalanced classification report.Save yourcredit_risk_ensemble.ipynbfile to your Credit_Risk_Analysis folder.Deliverable 3 RequirementsYou will earn a perfect score for Deliverable 3 by completing all requirements below:	TheBalancedRandomForestClassifieralgorithm does the following:			An accuracy score for the model is calculated(2.5 pt)						A confusion matrix has been generated(2.5 pt)						An imbalanced classification report has been generated(5 pt)						The features are sorted in descending order by feature importance(5 pt)							TheEasyEnsembleClassifieralgorithm does the following:			An accuracy score of the model is calculated(2.5 pt)						A confusion matrix has been generated(2.5 pt)						An imbalanced classification report has been generated(5 pt)						Deliverable 4: Written Report on the Credit Risk Analysis (30 points)Deliverable 4 InstructionsFor this deliverable, you’ll write a brief summary and analysis of the performance of all the machine learning models used in this Challenge.The report should contain the following:						Overview of the analysis:Explain the purpose of this analysis.							Results:Using bulleted lists, describe the balanced accuracy scores and the precision and recall scores of all six machine learning models. Use screenshots of your outputs to support your results.							Summary:Summarize the results of the machine learning models, and include a recommendation on the model to use, if any. If you do not recommend any of the models, justify your reasoning.	Deliverable 4 RequirementsStructure, Organization, and Formatting (6 points)The written analysis has the following structure, organization, and formatting:	There is a title, and there are multiple sections(2 pt)		Each section has a heading and subheading(2 pt)		Links to images are working, and code is formatted and displayed correctly(2 pt).Analysis (24 points)The written analysis has the following:			Overview of the loan prediction risk analysis:					The purpose of this analysis is well defined(4 pt)									Results:					There is a bulleted list that describes the balanced accuracy score and the precision and recall scores of all six machine learning models(15 pt)									Summary:					There is a summary of the results(2 pt)						There is a recommendation on which model to use, or there is no recommendation with a justification(3 pt)						SubmissionOnce you’re ready to submit, make sure to check your work against the rubric to ensure you are meeting the requirements for this Challenge one final time. It’s easy to overlook items when you’re in the zone!As a reminder, the deliverables for this Challenge are as follows:	Deliverable 1: Use Resampling Models to Predict Credit Risk	Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk	Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk	Deliverable 4: A Written Report on the Credit Risk Analysis (README.md)Upload the following to your Credit_Risk_Analysis GitHub repository:	Yourcredit_risk_resampling.ipynbfile.	Yourcredit_risk_ensemble.ipynbfile.	An updated README.md that has your written analysis.To submit your challenge assignment for grading in Bootcamp Spot, click Start Assignment, click the Website URL tab, then provide the URL of your Credit_Risk_Analysis GitHub repository, and then click Submit. Comments are disabled for graded submissions in BootCampSpot. If you have questions about your feedback, please notify your instructional staff or the Student Success Manager. If you would like to resubmit your work for an improved grade, you can use theRe-Submit Assignmentbutton to upload new links. You may resubmit up to 3 times for a total of 4 submissions.IMPORTANTOnce you receive feedback on your Challenge, make any suggested updates or adjustments to your work. Then, add this week’s Challenge to your professional portfolio.NOTEYou are allowed to miss up to two Challenge assignments and still earn your certificate. If you complete all Challenge assignments, your lowest two grades will be dropped. If you wish to skip this assignment, click Next, and move on to the next Module.RubricModule-17 RubricModule-17 Rubric						Criteria			Ratings			Pts									This criterion is linked to a learning outcomeDeliverable 1: Use Resampling Models to Predict Loan Risk																									30to >27.0PtsDemonstrating Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for ALL THREE algorithms.							27to >23.0PtsApproaching Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for TWO of THREE algorithms. ✓Code is written to generate a classification report for the third algorithm.							23to >19.0PtsDeveloping Proficiency✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓A classification report is generated for ONE of THREE algorithms. ✓Code is written to generate a classification report for TWO algorithms, but there are errors.							19to >0.0PtsEmerging✓There is an accuracy score and confusion matrix for ALL THREE algorithms. ✓Code is written to generate a classification report for ONE or more algorithms.							0PtsIncomplete																					30pts										This criterion is linked to a learning outcomeDeliverable 2: Use the SMOTEENN Algorithm to Predict Loan Risk																									15to >13.0PtsDemonstrating Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓A classification report is generated for the SMOTEENN algorithm.							13to >12.0PtsApproaching Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm, but there is a minor error.							12to >9.0PtsDeveloping Proficiency✓There is an accuracy score for the SMOTEENN algorithm. ✓There is a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm.							9to >0.0PtsEmerging✓There is an accuracy score for the SMOTEENN algorithm. ✓Code is written to generate a confusion matrix for the SMOTEENN algorithm. ✓Code is written to generate a classification report for the SMOTEENN algorithm.							0PtsIncomplete																					15pts										This criterion is linked to a learning outcomeDeliverable 3: Use Ensemble Classifiers to Predict Loan Risk																									25to >22.0PtsDemonstrating Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for TWO algorithms. ✓The list of features is sorted in descending order by feature importance.							22to >18.0PtsApproaching Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for TWO algorithms. ✓The list of features is not sorted in descending order by feature importance.							18to >16.0PtsDeveloping Proficiency✓There is an accuracy score and confusion matrix for TWO algorithms. ✓A classification report is generated for ONE of TWO algorithms. ✓Code is written to generate a classification report for the second algorithm. ✓Code is written that lists the features sorted in descending order by feature importance.							16to >0.0PtsEmerging✓There is an accuracy score and confusion matrix for TWO algorithms. ✓Code is written to generate a classification report for ONE of TWO algorithms. ✓Code is written that lists the features sorted in descending order by feature importance.							0PtsIncomplete																					25pts										This criterion is linked to a learning outcomeDeliverable 4: Structure, Organization, and Formatting																									6to >5.0PtsDemonstrating ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. ✓Each section has a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly.							5to >4.0PtsApproaching ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. ✓Each section has a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly, with one or two minor errors.							4to >3.0PtsDeveloping ProficiencyThe written analysis has ALL of the following: ✓There is a title, and there are multiple sections. AND ONE of the following: ✓Each section may have a heading and subheading. ✓There are images and references to code, and they are formatted and displayed correctly, with one or two minor errors.							3to >0.0PtsEmergingThe written analysis has ALL of the following: ✓There is a title. ✓There may be a subheading for a section. ✓There are no headings for each section, but there are three sections.							0PtsIncomplete																					6pts										This criterion is linked to a learning outcomeDeliverable 4: Analysis																									24to >20.0PtsDemonstrating Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for ALL SIX algorithms are described. ✓The results are summarized, and there is a recommendation on which model to use or justification.							20to >18.0PtsApproaching Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for FIVE of the SIX algorithms are described. ✓The results are summarized, but the recommendation on which model to use or justification is not clear.							18to >16.0PtsDeveloping Proficiency✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for FOUR of the SIX algorithms are described. ✓The results are summarized, but there is no recommendation on which model to use or justification.							16to >0.0PtsEmerging✓The purpose is well defined. ✓The balanced accuracy score and the precision and recall scores for THREE of the SIX algorithms are described. ✓The results are summarized, but there is no recommendation on which model to use or justification.							0PtsIncomplete																					24pts										Total points:100			17.10.3: Combination Sampling With SMOTEENN" style="float: left;">PreviousModule 17 Career Connection" style="float: right;">Next© 2020 - 2022 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.

Mohd · Accepted Answer

Answer Attached Below:

Background Jill commends you for all your hard work. Piece by piece, you’ve been building up your skills in data preparation, statistical reasoning, and machine learning. You are now ready to apply...

Background

What You're Creating

Files

Before You Start

Deliverable 1: Use Resampling Models to Predict Credit Risk (30 points)

Deliverable 1 Instructions

Deliverable 1 Requirements

Deliverable 2: Use the SMOTEENN algorithm to Predict Credit Risk (15 points)

Deliverable 2 Instructions

Deliverable 2 Requirements

Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk (25 points)

Deliverable 3 Instructions

Deliverable 3 Requirements

Deliverable 4: Written Report on the Credit Risk Analysis (30 points)

Deliverable 4 Instructions

Deliverable 4 Requirements

Structure, Organization, and Formatting (6 points)

Analysis (24 points)

Submission

Rubric

Answer To: Background Jill commends you for all your hard work. Piece by piece, you’ve been building up your...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment