lab11 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab11.ipynb") 1 Lab 11 Decision Trees and Random Forests In this assignment, we will have you train a multi-class...

1 answer below »
Only need help on Question 1c. Thanks!


lab11 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab11.ipynb") 1 Lab 11 Decision Trees and Random Forests In this assignment, we will have you train a multi-class classifier with three different models (one- vs-rest logistic regression, decision tree, random forest) and compare the accuracies and decision boundaries created by each. We’ll be looking at a dataset of per-game stats for all NBA players in the 2018-19 season. This dataset comes from basketball-reference.com. 1.0.1 Due Date This assignment is due on Saturday, July 31st at 11:59 pm PDT. 1.0.2 Collaboration Policy Data science is a collaborative activity. While you may talk with others about this assignment, we ask that you write your solutions individually. If you discuss the assignment with others, please include their names in the cell below. Collaborators: list names here 1.0.3 Lab Walkthrough Video In addition to the lab notebook, we have also released a prerecorded walk-through video of the lab. We encourage you to reference this video as you work through the lab. Run the cell below to display the video. [2]: from IPython.display import YouTubeVideo YouTubeVideo("K9iWroWKAVo", list = 'PLQCcNQgUcDfpZ1FqfNkS_uzlUkY-RJysT',␣ ↪→listType = 'playlist') [2]: 1 https://www.basketball-reference.com/ [3]: import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import seaborn as sns from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn import tree # ignore the warning you might get from importing ensemble from sklearn from sklearn import ensemble [4]: nba_data = pd.read_csv("nba18-19.csv") nba_data.head(5) [4]: Rk Player Pos Age Tm G GS MP FG FGA … \ 0 1 Álex Abrines\abrinal01 SG 25 OKC 31 2 19.0 1.8 5.1 … 1 2 Quincy Acy\acyqu01 PF 28 PHO 10 0 12.3 0.4 1.8 … 2 3 Jaylen Adams\adamsja01 PG 22 ATL 34 1 12.6 1.1 3.2 … 3 4 Steven Adams\adamsst01 C 25 OKC 80 80 33.4 6.0 10.1 … 4 5 Bam Adebayo\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 … 2 FT% ORB DRB TRB AST STL BLK TOV PF PTS 0 0.923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3 1 0.700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7 2 0.778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2 3 0.500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9 4 0.735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9 [5 rows x 30 columns] Our goal will be to predict a player’s position given several other features. The 5 positions in basketball are PG, SG, SF, PF, and C (which stand for point guard, shooting guard, small forward, power forward, and center). This information is contained in the Pos column. [5]: nba_data['Pos'].value_counts() [5]: SG 176 PF 147 PG 139 C 120 SF 118 PF-SF 2 SF-SG 2 SG-SF 1 C-PF 1 PF-C 1 SG-PF 1 Name: Pos, dtype: int64 While we could set out to try and perform 5-class classification, the results (and visualizations) are slightly more interesting if we try and categorize players into 1 of 3 categories: guard, forward, and center. The below code will take the Pos column of our dataframe and use it to create a new column Pos3 that consist of values G, F, and C (which stand for guard, forward, and center). [6]: def basic_position(pos): if 'F' in pos: return 'F' elif 'G' in pos: return 'G' return 'C' nba_data['Pos3'] = nba_data['Pos'].apply(basic_position) nba_data['Pos3'].value_counts() [6]: G 315 F 273 C 120 Name: Pos3, dtype: int64 3 Furthermore, since there are many players in the NBA (in the 2018-19 season there were 530 unique players), our visualizations can get noisy and messy. Let’s restrict our data to only contain rows for players that averaged 10 or more points per game. [7]: nba_data = nba_data[nba_data['PTS'] > 10] Now, let’s look at a scatterplot of Rebounds (TRB) vs. Assists (AST). [8]: sns.scatterplot(data = nba_data, x = 'AST', y = 'TRB', hue = 'Pos3'); As you can see, when using just rebounds and assists as our features, we see pretty decent cluster separation. That is, Centers, Forwards, and Guards appear in different regions of the plot. 1.1 Question 1: Evaluating Split Quality We will explore different ways to evaluate split quality for classification and regression trees in this question. 1.1.1 Question 1a In lecture we defined the entropy S of a node as: S = − ∑ C pC log2 pC where pC is the proportion of data points in a node with label C. This function helped us determine the unpredictability of a node in a decision tree. 4 Implement the entropy function, which outputs the entropy of a node with a given set of labels. The labels parameter is a list of labels in our dataset. For example, labels could be ['G', 'G', 'F', 'F', 'C', 'C']. [9]: def entropy(labels): unique_labels, counts =np.unique(labels, return_counts= True) pc = counts/ sum(counts) e = - np.sum(pc * np.log2(pc)) return e entropy(nba_data['Pos3']) [9]: 1.521555567956027 [10]: grader.check("q1a") [10]: q1a results: All test cases passed! 1.1.2 Question 1b The decision tree visualizations in lecture contained nodes with a gini parameter. This depicts the node’s Gini impurity, which is the chance that a sample would be misclassified if randomly assigned at this point. Gini impurity is a popular alternative to entropy for determining the best split at a node, and it is in fact the default criterion for scikit-learn’s DecisionTreeClassifier. We can calculate the Gini impurity of a node with the formula (pC is the proportion of data points in a node with label C): G = 1− ∑ C pC 2 Note that no logarithms are involved in the calculation of Gini impurity, which can make it faster to compute compared to entropy. Implement the gini_impurity function, which outputs the Gini impurity of a node with a given set of labels. The labels parameter is defined similarly to the previous part. [11]: def gini_impurity(labels): unique_labels, counts =np.unique(labels, return_counts= True) pc = counts/ sum(counts) g = 1 - np.sum(pc ** 2) return g gini_impurity(nba_data['Pos3']) [11]: 0.6383398017253514 [12]: grader.check("q1b") 5 [12]: q1b results: All test cases passed! As an optional exercise in probability, try to think of a way to derive the formula for Gini impurity. It usually does not make sense to use entropy and Gini impurity for regression trees because the response variable is continuous. However, we can use the variance of the response values in a node as an alternative to entropy and Gini impurity. Recall that the variance is defined as: σ2 = 1 N N∑ i=1 (xi − µ)2 where µ is the mean, N is the total number of data points, and xi is the value of each data point. Here we define variance as we have previously done in the course [13]: def variance(values): return np.mean((values - np.mean(values)) ** 2) variance(nba_data['PTS']) [13]: 21.023148263588652 1.1.3 Question 1c In lecture, we used weighted entropy as a loss function to help us determine the best split. Recall that the weighted entropy is given by: L = N1S(X) +N2S(Y ) N1 +N2 N1 is the number of samples in the left node X, and N2 is the number of samples in the right node Y . This notion of a weighted average can be extended to other metrics such as Gini impurity and variance simply by changing the S (entropy) function to G (Gini impurity) or σ2 (variance). First, implement the weighted_metric function. The left parameter is a list of labels or values in the left node X, and the right parameter is a list of labels or values in the right node Y . The metric parameter is a function which can be entropy, gini_impurity, or variance. For entropy and gini_impurity, you may assume that left and right contain discrete labels. For variance, you may assume that left and right contain continuous values. Then, assign we_pos3_age_30 to the weighted entropy (in the Pos3 column) of a split that partitions nba_data into two groups: a group with players who are 30 years old or older and a group with players who are younger than 30 years old. [1]: def weighted_metric(left, right, metric): n1, S_X, n2, S_Y = len(left), metric(left), len(right), metric(right) return ((n1 * S_X) + (n2 * S_Y))/ (n1 + n2) 6 we_pos3_age_30 = weighted_metric(nba_data[nba_data['Age'] >= 30]['Pos3'],␣ ↪→nba_data[nba_data]) we_pos3_age_30 --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_40/4124766125.py in 3 return ((n1 * S_X) + (n2 * S_Y))/ (n1 + n2) 4 ----> 5 we_pos3_age_30 = weighted_metric(nba_data[nba_data['Age'] >=␣ ↪→30]['Pos3'], nba_data[nba_data]) 6 we_pos3_age_30 NameError: name 'nba_data' is not defined [2]: grader.check("q1c") --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_40/2855975128.py in ----> 1 grader.check("q1c") NameError: name 'grader' is not defined We will not go over the entire decision tree fitting process in this assignment, but you now have the basic tools to fit a decision tree. As an optional exercise, try to think about how you would extend these tools to fit a decision tree from scratch. 1.2 Question 2: Classification Let’s switch gears to classification with the NBA dataset. 1.3 One-vs-Rest Logistic Regression We only discussed binary logistic regression in class, but there is a natural extension to binary logistic regression called one-vs-rest logistic regression for multiclass classification. In essence, one- vs-rest logistic regression simply builds one binary logistic regression classifier for each of the N classes (in this scenario N = 3). We then predict the class corresponding to the classifier that gives the highest probability among the N classes. Before using logistic regression, let’s first split nba_data into a training set and test set. [3]: nba_train, nba_test = train_test_split(nba_data, test_size=0.25,␣ ↪→random_state=100) nba_train = nba_train.sort_values(by='Pos') nba_test = nba_test.sort_values(by='Pos') 7 --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_40/2288098315.py in ----> 1 nba_train, nba_test = train_test_split(nba_data, test_size=0.25,␣ ↪→random_state=100) 2 nba_train = nba_train.sort_values(by='Pos') 3 nba_test = nba_test.sort_values(by='Pos') NameError: name 'train_test_split' is not defined 1.3.1 Question 2a In the cell below, set logistic_regression_model to be a one-vs-rest logistic regression model. Then, fit that model using the AST and TRB columns (in that order) from nba_train as our features, and Pos3 as our response variable. Remember, sklearn.linear_model.LogisticRegression has already been imported for you. There is an optional parameter multi_class you need to specify in order to make your model a multi-class one-vs-rest classifier. See the documentation for more details. [4]: logistic_regression_model = LogisticRegression(multi_class='ovr') logistic_regression_model.fit(nba_train[['AST', 'TRB']], nba_train['Pos3']) --------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_40/1347373509.py in ----> 1 logistic_regression_model = LogisticRegression(multi_class='ovr') 2 logistic_regression_model.fit(nba_train[['AST',
Answered Same DayAug 05, 2021

Answer To: lab11 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab11.ipynb") 1...

Swapnil answered on Aug 06 2021
151 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"import otter\n",
"grader = otter.Notebook()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "imports",
"locked": true,
"schema_version": 2,
"solution": false
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.colors import ListedColormap\n",
"import seaborn as sns\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import tree\n",
"from sklearn import ensemble"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
RkPlayerPosAgeTmGGSMPFGFGA...FT%ORBDRBTRBASTSTLBLKTOVPFPTS
01Álex Abrines\\abrinal01SG25OKC31219.01.85.1...0.9230.21.41.50.60.50.20.51.75.3
12Quincy Acy\\acyqu01PF28PHO10012.30.41.8...0.7000.32.22.50.80.10.40.42.41.7
23Jaylen Adams\\adamsja01PG22ATL34112.61.13.2...0.7780.31.41.81.90.40.10.81.33.2
34Steven Adams\\adamsst01C25OKC808033.46.010.1...0.5004.94.69.51.61.51.01.72.613.9
45Bam Adebayo\\adebaba01C21MIA822823.33.45.9...0.7352.05.37.32.20.90.81.52.58.9
\n",
"

5 rows × 30 columns

\n",
"
"
],
"text/plain": [
" Rk Player Pos Age Tm G GS MP FG FGA ... \\\n",
"0 1 Álex Abrines\\abrinal01 SG 25 OKC 31 2 19.0 1.8 5.1 ... \n",
"1 2 Quincy Acy\\acyqu01 PF 28 PHO 10 0 12.3 0.4 1.8 ... \n",
"2 3 Jaylen Adams\\adamsja01 PG 22 ATL 34 1 12.6 1.1 3.2 ... \n",
"3 4 Steven Adams\\adamsst01 C 25 OKC 80 80 33.4 6.0 10.1 ... \n",
"4 5 Bam Adebayo\\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 ... \n",
"\n",
" FT% ORB DRB TRB AST STL BLK TOV PF PTS \n",
"0 0.923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3 \n",
"1 0.700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7 \n",
"2 0.778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2 \n",
"3 0.500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9 \n",
"4 0.735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9 \n",
"\n",
"[5 rows x 30 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_data = pd.read_csv(\"nba18-19.csv\")\n",
"nba_data.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SG 176\n",
"PF 147\n",
"PG 139\n",
"C 120\n",
"SF 118\n",
"SF-SG 2\n",
"PF-SF 2\n",
"PF-C 1\n",
"SG-PF 1\n",
"C-PF 1\n",
"SG-SF 1\n",
"Name: Pos, dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_data['Pos'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"G 315\n",
"F 273\n",
"C 120\n",
"Name: Pos3, dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def basic_position(pos):\n",
" if 'F' in pos:\n",
" return 'F'\n",
" elif 'G' in pos:\n",
" return 'G'\n",
" return 'C'\n",
"\n",
"\n",
"nba_data['Pos3'] = nba_data['Pos'].apply(basic_position)\n",
"nba_data['Pos3'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"nba_data = nba_data[nba_data['PTS'] > 10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's look at a scatterplot of Rebounds (`TRB`) vs. Assists (`AST`)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(data = nba_data, x = 'AST', y = 'TRB', hue = 'Pos3');"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.5215555679560273"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def entropy(labels):\n",
" labels = labels.value_counts().to_numpy()\n",
" labels = labels / np.sum(labels)\n",
" return -1 * np.sum(labels * np.log(labels) / np.log(2))\n",
"\n",
"entropy(nba_data['Pos3'])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"

All tests passed!

\n",
" \n",
" "
],
"text/plain": [
"\n",
" All tests passed!\n",
" "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grader.check(\"q1a\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6383398017253514"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def gini_impurity(labels):\n",
" labels = labels.value_counts().to_numpy()\n",
" labels = labels / np.sum(labels)\n",
" return 1 - np.sum(labels**2)\n",
"\n",
"gini_impurity(nba_data['Pos3'])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"

All tests passed!

\n",
" \n",
" "
],
"text/plain": [
"\n",
" All tests passed!\n",
" "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grader.check(\"q1b\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"21.023148263588652"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def variance(values):\n",
" return np.mean((values - np.mean(values))**2)\n",
" \n",
"variance(nba_data['PTS'])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"

All tests passed!

\n",
" \n",
" "
],
"text/plain": [
"\n",
" All tests passed!\n",
" "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grader.check(\"q1c\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.521489768014793"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def weighted_metric(left, right, metric):\n",
" return ((len(left) * metric(left) + len(right) * metric(right))\n",
" / (len(left) + len(right)))\n",
"\n",
"we_pos3_age_30 = weighted_metric(\n",
" nba_data[nba_data[\"Age\"] < 30][\"Pos3\"],\n",
" nba_data[nba_data[\"Age\"] >= 30][\"Pos3\"],\n",
" entropy)\n",
"we_pos3_age_30"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"

All tests passed!

\n",
" \n",
" "
],
"text/plain": [
"\n",
" All tests passed!\n",
" "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grader.check(\"q1c\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"nba_train, nba_test = train_test_split(nba_data, test_size=0.25, random_state=100)\n",
"nba_train = nba_train.sort_values(by='Pos')\n",
"nba_test = nba_test.sort_values(by='Pos')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
" multi_class='ovr', n_jobs=None, penalty='l2',\n",
" random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
" warm_start=False)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logistic_regression_model = LogisticRegression(multi_class=\"ovr\")\n",
"logistic_regression_model.fit(X=nba_train[[\"AST\", \"TRB\"]], y=nba_train[\"Pos3\"])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"

All tests passed!

\n",
" \n",
" "
],
"text/plain": [
"\n",
" All tests passed!\n",
" "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grader.check(\"q2a\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
ASTTRBPos3Predicted (OVRLR) Pos3
6551.48.6CC
6442.010.2CC
7030.84.5CF
6521.67.2CF
1651.47.5CC
1222.48.4CC
3537.310.8CC
3671.48.6CC
4081.24.9CF
1613.912.0CC
6473.412.4CC
3084.26.7CG
3623.011.4CC
1463.68.2CC
2334.47.9CC
\n",
"
"
],
"text/plain": [
" AST TRB Pos3 Predicted (OVRLR) Pos3\n",
"655 1.4 8.6 C C\n",
"644 2.0 10.2 C C\n",
"703 0.8 4.5 C F\n",
"652 1.6 7.2 C F\n",
"165 1.4 7.5 C C\n",
"122 2.4 8.4 C C\n",
"353 7.3 10.8 C C\n",
"367 1.4 8.6 C C\n",
"408 1.2 4.9 C F\n",
"161 3.9 12.0 C C\n",
"647 3.4 12.4 C C\n",
"308 4.2 6.7 C G\n",
"362 3.0 11.4 C C\n",
"146 3.6 8.2 C C\n",
"233 4.4 7.9 C C"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_train['Predicted (OVRLR) Pos3'] = logistic_regression_model.predict(nba_train[['AST', 'TRB']])\n",
"nba_train[['AST', 'TRB', 'Pos3', 'Predicted (OVRLR) Pos3']].head(15)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0.7964071856287425"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr_training_accuracy = logistic_regression_model.score(nba_train[['AST', 'TRB']], nba_train['Pos3'])\n",
"lr_training_accuracy"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6428571428571429"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr_test_accuracy = logistic_regression_model.score(nba_test[['AST', 'TRB']], nba_test['Pos3'])\n",
"lr_test_accuracy"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here