You need to write a research report on a machine learning project. The analysis has been done and the script for this project is uploaded named [MLProject.ipynb]. Please review, check and feel free to modify it. For the information about this project please check the uploaded doc named [Project_Proposal.pdf]. For the other doc uploaded is a template,you can refer and follow it to write this report. Do not include any codes in the report, write at least 6 pages excluding appendices. Thank you!
{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "LBdSH0Yg6XoH" }, "source": [ "Load Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "52VtTLi0EE7r" }, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "import numpy as np\n", "from collections import Counter\n", "import matplotlib.pyplot as plt\n", "from matplotlib.pyplot import figure\n", "from pickle import dump\n", "from pickle import load" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "btQItuEQMuS_" }, "outputs": [], "source": [ "def warn(*args, **kwargs):\n", " pass\n", "import warnings\n", "warnings.warn = warn\n", "\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", "from sklearn.decomposition import PCA\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.svm import SVC\n", "from sklearn.model_selection import KFold\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn.model_selection import cross_validate\n", "from sklearn.metrics import cohen_kappa_score\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.metrics import classification_report\n", "from sklearn.metrics import plot_confusion_matrix\n", "from sklearn import tree\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.metrics import roc_curve\n", "from sklearn.metrics import roc_auc_score\n", "from sklearn.metrics import plot_roc_curve\n", "from sklearn.metrics import precision_recall_curve\n", "from sklearn.metrics import f1_score\n", "from sklearn.metrics import auc\n", "from sklearn.metrics import precision_score\n", "from sklearn.metrics import recall_score\n", "from sklearn.metrics import make_scorer" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "DQ4gQmpQMv7X" }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split, GridSearchCV, RepeatedStratifiedKFold\n", "from sklearn.feature_selection import RFECV, SelectKBest\n", "from imblearn.under_sampling import RandomUnderSampler\n", "from imblearn.over_sampling import SMOTE\n", "from sklearn.preprocessing import StandardScaler\n", "from imblearn.pipeline import Pipeline" ] }, { "cell_type": "markdown", "metadata": { "id": "EvaThE8LQkIH" }, "source": [ "# Python Project Template\n", "# 1. Prepare Problem\n", "# a) Load libraries\n", "# b) Load dataset\n", "# 2. Summarize Data\n", "# a) Descriptive statistics\n", "# b) Data visualizations\n", "# 3. Prepare Data\n", "# a) Data Cleaning\n", "# b) Feature Selection\n", "# c) Data Transforms\n", "# 4. Evaluate Algorithms\n", "# a) Split-out validation dataset\n", "# b) Test options and evaluation metric\n", "# c) Spot Check Algorithms\n", "# d) Compare Algorithms\n", "# 5. Improve Accuracy\n", "# a) Algorithm Tuning\n", "# b) Ensembles\n", "# 6. Finalize Model\n", "# a) Predictions on validation dataset\n", "# b) Create standalone model on entire training dataset\n", "# c) Save model for later use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Load dataset" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "2wUOXeYtEhIH" }, "outputs": [], "source": [ "data = pd.read_csv(\"data.csv\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 372 }, "id": "BjTO4L74FiXn", "outputId": "c87a1064-600d-4126-cf48-fcfdbec4dd65" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n",
|
Bankrupt? |
ROA(C) before interest and depreciation before interest |
ROA(A) before interest and % after tax |
ROA(B) before interest and depreciation after tax |
Operating Gross Margin |
Realized Sales Gross Margin |
Operating Profit Rate |
Pre-tax net Interest Rate |
After-tax net Interest Rate |
Non-industry income and expenditure/revenue |
... |
Net Income to Total Assets |
Total assets to GNP price |
No-credit Interval |
Gross Profit to Sales |
Net Income to Stockholder's Equity |
Liability to Equity |
Degree of Financial Leverage (DFL) |
Interest Coverage Ratio (Interest expense to EBIT) |
Net Income Flag |
Equity to Liability |
---|
0 |
1 |
0.370594 |
0.424389 |
0.405750 |
0.601457 |
0.601457 |
0.998969 |
0.796887 |
0.808809 |
0.302646 |
... |
0.716845 |
0.009219 |
0.622879 |
---|