lab12 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab12.ipynb") 1 Lab 12: Principal Component Analysis In this lab assignment, we will walk through an example of using...

1 answer below »
only need question 1a could you make it cheaper?


lab12 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab12.ipynb") 1 Lab 12: Principal Component Analysis In this lab assignment, we will walk through an example of using Principal Component Analysis (PCA) on a dataset involving iris plants. 1.0.1 Due Date This assignment is due at Saturday, August 7th at 11:59 pm PDT. 1.0.2 Collaboration Policy Data science is a collaborative activity. While you may talk with others about this assignment, we ask that you write your solutions individually. If you discuss the assignment with others, please include their names in the cell below. Collaborators: list names here 1.0.3 Lab Walkthrough Video In addition to the lab notebook, we have also released a prerecorded walk-through video of the lab. We encourage you to reference this video as you work through the lab. Run the cell below to display the video. [2]: from IPython.display import YouTubeVideo YouTubeVideo("DMp3l6Bybz8", list = 'PLQCcNQgUcDfrDRQ9E-Rl6irt2InIWd4Ef',␣ ↪→listType = 'playlist') [2]: 1 https://en.wikipedia.org/wiki/Iris_plant [3]: from sklearn.datasets import load_iris import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline To begin, run the following cell to load the dataset into this notebook. * iris_features will contain a numpy array of 4 attributes for 150 different plants (shape 150 x 4). * iris_target will contain the class of each plant. There are 3 classes of plants in the dataset: Iris-Setosa, Iris-Versicolour, and Iris-Virginica. The class names will be stored in iris_target_names. * iris_feature_names will be a list of 4 names, one for each attribute in iris_features. [4]: from sklearn.datasets import load_iris iris_data = load_iris() # Loading the dataset # Unpacking the data into arrays iris_features = iris_data['data'] iris_target = iris_data['target'] iris_feature_names = iris_data['feature_names'] iris_target_names = iris_data['target_names'] 2 # Convert iris_target to string labels instead of int labels currently (0, 1,␣ ↪→2) for the classes iris_target = iris_target_names[iris_target] Let’s explore the data by creating a scatter matrix of our iris features. To do this, we’ll create 2D scatter plots for every possible pair of our four features. This should result in six total scatter plots in our scatter matrix with the classes labeled in distinct colors for each plot. [5]: plt.figure(figsize=(14, 10)) plt.suptitle("Scatter Matrix of Iris Features") plt.subplots_adjust(wspace=0.3, hspace=0.3) for i in range(1, 4): for j in range(i): plt.subplot(3, 3, i+3*j) sns.scatterplot(x=iris_features[:, i], y=iris_features[:, j],␣ ↪→hue=iris_target) plt.xlabel(iris_feature_names[i]) plt.ylabel(iris_feature_names[j]) 3 1.1 Question 1a To apply PCA, we will first need to center and scale the data so that the mean of each feature is 0, and the standard deviation of each feature is 1. Compute the columnwise mean of iris_features in the cell below and store it in iris_mean, and compute the columnwise standard deviation of iris_features and store it in iris_std. Each should be a numpy array of 4 means, 1 for each feature. Then, subtract iris_mean from iris_features and divide by iris_std, and finally, save the result in features. Hints: * Use np.mean or np.average to compute iris_mean, and pay attention to the axis argument. * If you are confused about how numpy deals with arithmetic operations between arrays of different shapes, see this note about broadcasting for explanations/examples. [7]: iris_mean = np.mean(iris_features, axis = 0) irs_std = np.std(iris_features) features = (iris_features - iris_mean) iris_mean, iris_std [7]: (array([5.84333333, 3.05733333, 3.758 , 1.19933333]), Ellipsis) [8]: grader.check("q1a") [8]: q1a results: Trying: np.all(np.isclose(iris_std, np.array([0.82530129, 0.43441097, 1.75940407, 0.75969263]))) Expecting: True ********************************************************************** Line 1, in q1a 2 Failed example: np.all(np.isclose(iris_std, np.array([0.82530129, 0.43441097, 1.75940407, 0.75969263]))) Exception raised: Traceback (most recent call last): File "/opt/conda/lib/python3.8/doctest.py", line 1336, in __run exec(compile(example.source, filename, "single", File "", line 1, in np.all(np.isclose(iris_std, np.array([0.82530129, 0.43441097, 1.75940407, 0.75969263]))) File "<__array_function__ internals="">", line 5, in isclose File "/opt/conda/lib/python3.8/site-packages/numpy/core/numeric.py", line 2287, in isclose xfin = isfinite(x) TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' Trying: 4 https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html np.all(np.isclose(np.ones(4), np.std(features, axis = 0))) # make sure data has SD 1 Expecting: True ********************************************************************** Line 1, in q1a 4 Failed example: np.all(np.isclose(np.ones(4), np.std(features, axis = 0))) # make sure data has SD 1 Expected: True Got: False Trying: -2.56 < np.sum(features[0])="">< -2.5="" expecting:="" true="" **********************************************************************="" line="" 1,="" in="" q1a="" 5="" failed="" example:="" -2.56="">< np.sum(features[0])="">< -2.5="" expected:="" true="" got:="" false="" 1.2="" question="" 1b="" as="" you="" may="" recall="" from="" lecture,="" pca="" is="" a="" specific="" application="" of="" the="" singular="" value="" decomposition="" (svd)="" for="" matrices.="" in="" the="" following="" cell,="" let’s="" use="" the="" np.linalg.svd="" function="" to="" compute="" the="" svd="" of="" our="" features.="" store="" the="" left="" singular="" vectors,="" singular="" values,="" and="" right="" singular="" vectors="" in="" u,="" s,="" and="" vt,="" respectively.="" note="" that="" vt="" corresponds="" to="" v="" t="" .="" set="" the="" full_matrices="" argument="" of="" np.linalg.svd="" to="" false.="" [9]:="" u,="" s,="" vt="np.linalg.svd(features," full_matrices="False)" u.shape,="" s,="" vt.shape="" [9]:="" ((150,="" 4),="" array([25.09996044,="" 6.01314738,="" 3.41368064,="" 1.88452351]),="" (4,="" 4))="" [10]:="" grader.check("q1b")="" [10]:="" q1b="" results:="" trying:="" np.all(np.isclose(s,="" np.array([20.92306556,="" 11.7091661="" ,="" 4.69185798,="" 1.76273239])))="" expecting:="" true="" 5="" https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html="" **********************************************************************="" line="" 1,="" in="" q1b="" 0="" failed="" example:="" np.all(np.isclose(s,="" np.array([20.92306556,="" 11.7091661="" ,="" 4.69185798,="" 1.76273239])))="" expected:="" true="" got:="" false="" 1.3="" question="" 1c="" what="" can="" we="" learn="" from="" the="" singular="" values="" in="" s?="" formally,="" we="" can="" measure="" the="" amount="" of="" variance="" captured="" by="" the="" i’th="" principal="" component="" as:="" σ2i="" n="" ,="" where="" σi="" is="" the="" singular="" value="" of="" the="" i’th="" principal="" component="" and="" n="" is="" the="" total="" number="" of="" data="" points.="" compute="" the="" total="" variance="" of="" our="" data="" below="" by="" summing="" the="" square="" of="" each="" singular="" value="" in="" s="" and="" dividing="" the="" result="" by="" the="" total="" number="" of="" data="" points.="" store="" the="" result="" in="" the="" variable="" total_variance.="" [11]:="" total_variance="np.sum(np.square(s))/iris_features.shape[0]" print("total_variance:="" {:.3f}="" should="" approximately="" equal="" the="" sum="" of="" the="" feature␣="" ↪→variances:="" {:.3f}"="" .format(total_variance,="" np.sum(np.var(features,="" axis="0))))" total_variance:="" 4.542="" should="" approximately="" equal="" the="" sum="" of="" the="" feature="" variances:="" 4.542="" [12]:="" grader.check("q1c")="" [12]:="" q1c="" results:="" trying:="" np.isclose(total_variance,="" 4)="" expecting:="" true="" **********************************************************************="" line="" 1,="" in="" q1c="" 0="" failed="" example:="" np.isclose(total_variance,="" 4)="" expected:="" true="" got:="" false="" as="" you="" can="" see,="" total_variance="" is="" equal="" to="" the="" sum="" of="" the="" feature="" variances.="" 6="" 1.4="" question="" 2a="" let’s="" now="" use="" only="" the="" first="" two="" principal="" components="" to="" see="" what="" a="" 2d="" version="" of="" our="" iris="" data="" looks="" like.="" first,="" construct="" the="" 2d="" version="" of="" the="" iris="" data="" by="" multiplying="" our="" features="" array="" with="" the="" first="" two="" right="" singular="" vectors="" in="" v.="" because="" the="" first="" two="" right="" singular="" vectors="" are="" directions="" for="" the="" first="" two="" principal="" components,="" this="" will="" project="" the="" iris="" data="" down="" from="" a="" 4d="" subspace="" to="" a="" 2d="" subspace.="" hints:="" *="" to="" matrix-multiply="" two="" numpy="" arrays,="" use="" @="" or="" np.dot.="" *="" note="" that="" the="" output="" of="" np.linalg.svd="" is="" vt="" and="" not="" v:="" the="" first="" two="" right="" singular="" vectors="" in="" v="" will="" be="" the="" first="" two="" columns="" of="" v,="" or="" the="" first="" two="" rows="" of="" vt="" (transposed="" to="" be="" column="" vectors="" instead="" of="" row="" vectors).="" *="" since="" we="" want="" to="" obtain="" a="" 2d="" version="" of="" our="" iris="" dataset,="" the="" shape="" of="" iris_2d="" should="" be="" (150,="" 2).="" [13]:="" iris_2d="np.dot(features," vt[:2,="" :].t)="" np.sum(iris_2d[:,="" 0])="" [13]:="" -4.405364961712621e-13="" [14]:="" grader.check("q2a")="" [14]:="" q2a="" results:="" trying:="" -2.75="">< np.sum(iris_2d[0])="">< -2.74="" expecting:="" true="" **********************************************************************="" line="" 1,="" in="" q2a="" 1="" failed="" example:="" -2.75="">< np.sum(iris_2d[0])="">< -2.74="" expected:="" true="" got:="" false="" now,="" run="" the="" cell="" below="" to="" create="" the="" scatter="" plot="" of="" our="" 2d="" version="" of="" the="" iris="" data,="" iris_2d.="" [15]:="" plt.figure(figsize="(9," 6))="" plt.title("pc2="" vs.="" pc1="" for="" iris="" data",="" fontsize="18)" plt.xlabel("iris="" pc1",="" fontsize="15)" plt.ylabel("iris="" pc2",="" fontsize="15)" sns.scatterplot(x="iris_2d[:," 0],="" y="iris_2d[:," 1],="" hue="iris_target);" 7="" 1.5="" question="" 2b="" what="" do="" you="" observe="" about="" the="" plot="" above?="" if="" you="" were="" given="" a="" point="" in="" the="" subspace="" defined="" by="" pc1="" and="" pc2,="" how="" well="" would="" you="" be="" able="" to="" classify="" the="" point="" as="" one="" of="" the="" three="" iris="" types?="" the="" data="" are="" aggregating="" well,="" but="" aren’t="" seperate="" clearly.="" we="" can="" apply="" classification="" techniques="" such="" as="" the="" decision="" tree.="" 1.6="" question="" 2c="" what="" proportion="" of="" the="" total="" variance="" is="" accounted="" for="" when="" we="" project="" the="" iris="" data="" down="" to="" two="" dimensions?="" compute="" this="" quantity="" in="" the="" cell="" below="" by="" dividing="" the="" variance="" captured="" by="" the="" first="" two="" singular="" values="" (also="" known="" as="" component="" scores)="" in="" s="" by="" the="" total_variance="" you="" calculated="" previously.="" store="" the="" result="" in="" two_dim_variance.="" [16]:="" two_dim_variance="(np.sum(np.square(s[:2]))/iris_features.shape[0])/" ↪→total_variance="" two_dim_variance="" [16]:="" 0.9776852063187949="" [17]:="" grader.check("q2c")="" 8="" [17]:="" q2c="" results:="" trying:="" 0.95="">< two_dim_variance="">< 0.96="" expecting:="" true="" **********************************************************************="" line="" 1,="" in="" q2c="" 0="" failed="" example:="" 0.95="">< two_dim_variance="">< 0.96="" expected:="" true="" got:="" false="" most="" of="" the="" variance="" in="" the="" data="" is="" explained="" by="" the="" two-dimensional="" projection!="" 1.7="" question="" 3="" as="" a="" last="" step,="" we="" will="" create="" a="" scree="" plot="" to="" visualize="" the="" weight="" of="" each="" principal="" component.="" in="" the="" cell="" below,="" create="" a="" scree="" plot="" by="" creating="" a="" line="" plot="" of="" the="" component="" scores="" (variance="" captured="" by="" each="" principal="" component)="" vs.="" the="" principal="" component="" number="" (1st,="" 2nd,="" 3rd,="" or="" 4th).="" your="" graph="" should="" match="" the="" image="" below:="" hint:="" you="" may="" find="" plt.xticks()="" helpful="" when="" formatting="" your="" plot="" axes.="" [18]:="" plt.xticks([1,="" 2,="" 3,="" 4])="" plt.plot([1,="" 2,="" 3,="" 4],s**2/="" iris_features.shape[0])="" plt.xlabel("principal="" component")="" plt.ylabel("variance="" (component="" scores)")="" plt.title("scree="" plot="" of="" iris="" principal="" components")="" [18]:="" text(0.5,="" 1.0,="" 'scree="" plot="" of="" iris="" principal="" components')="" 9="" https://en.wikipedia.org/wiki/scree_plot="" to="" double-check="" your="" work,="" the="" cell="" below="" will="" rerun="" all="" of="" the="" autograder="" tests.="" [19]:="" grader.check_all()="" [19]:="" q1a="" results:="" trying:="" np.all(np.isclose(iris_std,="" np.array([0.82530129,="" 0.43441097,="" 1.75940407,="" 0.75969263])))="" expecting:="" true="" **********************************************************************="" line="" 1,="" in="" q1a="" 2="" failed="" example:="" np.all(np.isclose(iris_std,="" np.array([0.82530129,="" 0.43441097,="" 1.75940407,="" 0.75969263])))="" exception="" raised:="" traceback="" (most="" recent="" call="" last):="" file="" "/opt/conda/lib/python3.8/doctest.py",="" line="" 1336,="" in="" __run="" exec(compile(example.source,="" filename,="" "single",="" file="">", line 1, in np.all(np.isclose(iris_std, np.array([0.82530129, 0.43441097, 1.75940407, 0.75969263]))) 10 File "<__array_function__ internals="">", line 5, in isclose File "/opt/conda/lib/python3.8/site-packages/numpy/core/numeric
Answered 1 days AfterAug 06, 2021

Answer To: lab12 August 5, 2021 [1]: # Initialize Otter import otter grader = otter.Notebook("lab12.ipynb") 1...

Neha answered on Aug 07 2021
149 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"iris_data = load_iris() # Loading the dataset\n",
"# Unpacking the data into arrays\n",
"iris_features = iris_data['data']\n",
"iris_target = iris_data['target']\n",
"iris_feature_names = iris_data['feature_names']\n",
"iris_target_names = iris_data['target_names']"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"iris_target = iris_target_names[iris_target]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here