For the following assignments, please provide as much evidence of the results as possible, including the code, screenshots (only plots – not text or code) and documentation. Submit only one pdf file...

1 answer below »
Please find questions for this assignment in the document. You need to complete all those questions including short answers in documentation and script for coding. Thank you.


For the following assignments, please provide as much evidence of the results as possible, including the code, screenshots (only plots – not text or code) and documentation. Submit only one pdf file and .ipynb / .py files containing the code with documentation. Choose any cleaned dataset such as the ones here: https://www.kaggle.com/search?q=cleaned+datasets+datasetFileTypes%3Acsv 1.a. [10 points] Ignore the label column and apply the AgglomerativeClustering method from sklearn.cluster on this dataset. Use min, average, and ward methods explained in the class to perform the hierarchical clustering. Please feel free to refer to https://scikit-learn.org/stable/auto_examples/cluster/plot_digits_linkage.html#sphx-glr-auto-examples-cluster-plot-digits-linkage-py 1.b. [10 points] Generate visualizations like in the above tutorial and dendrograms (please feel free to refer https://scikit-learn.org/stable/search.html?q=dendrogram) for each of the methods. 1.c. [10 points] Which method produces clusters that are most closely aligned with the labels in the dataset? Explain. 1.d. [20 points] Using the k-means algorithm where k=2 and corresponding visualizations, explain if it fares better than the agglomerative approaches in terms of the alignment with the labels. Hint: (a) Choose a smaller dataset for easier and better visualization and analysis (b) Cut the dendrogram at an appropriate level to result in just two clusters, in order to see how aligned these two clusters are with the assigned labels. 2. [25 points] The wine data set at https://archive.ics.uci.edu/ml/datasets/wine has 13 features. Develop in Python and apply your own version of the PCA algorithm to this data set, to visualize how PCA helps with dimensionality reduction. Explain how many Principal Components you will choose and why. What percent of the variance in the data do the selected Principal Components cover? For the implementation, you may use any objects, modules, and functions in NumPy, SciPy and other python libraries to do various operations such as to compute the eigen values, vectors or perform any other math / linear algebra operation, but not use the PCA function available in SciKit-Learn directly. 3.a. [20 points] Refer to online tutorials on regularization such as https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 and https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b Apply the techniques from the above tutorial to the student dataset at https://archive.ics.uci.edu/ml/datasets/student+performance Does regularization help improve the accuracy of predicting the final Math grade of the students? 3.b. [5 points] For regularization, we added the regularizer to the loss function. Does it make sense to multiply or subtract the term, instead? Explain.
Answered 3 days AfterMay 11, 2021

Answer To: For the following assignments, please provide as much evidence of the results as possible, including...

Suraj answered on May 15 2021
167 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.cluster import AgglomerativeClustering\n",
"import scipy.cluster.hierarchy as shc\n",
"from sklearn.metrics import silhouette_score\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error\n",
"import math\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn.pipeline import Pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CustomerIDGenreAgeAnnual Income (k$)Spending Score (1-100)
01Male191539
12Male211581
23Female20166
34Female231677
45Female311740
\n",
"
"
],
"text/plain": [
" CustomerID Genre Age Annual Income (k$) Spending Score (1-100)\n",
"0 1 Male 19 15 39\n",
"1 2 Male 21 15 81\n",
"2 3 Female 20 16 6\n",
"3 4 Female 23 16 77\n",
"4 5 Female 31 17 40"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df=pd.read_csv(\"C:/Users/Hp/Desktop/Mall_Customers.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeEAAAHhCAYAAABZSgYOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5xdVX338c+PBA3hHi5yUwIqGcEqYLjEiqioxRjFqnglaB5bWq2WqtX6+Hi39tGnrfVSFdEaRUABFUWMQRSQUgEJF+U2QY3cI4wJmAABAlnPH2sdcnI4Z+bMnJNZQ+bzfr147Zlz1t5r7bXXXt+99zkZIqWEJEkaf5vVboAkSZOVISxJUiWGsCRJlRjCkiRVYghLklSJISxJUiWGsNRBRFwbEc/byHWkiHhK+fmEiPjgRGhXU11fj4h/3kjb/khEnLwxti09VhjCmpQi4pyI+Fib14+KiD9ExNSU0n4ppQvGq00ppb9NKX28i3IbpV0R8eaIuKjf2+2HjXkxINVkCGuy+jowPyKi5fX5wCkppYfGv0mSJhtDWJPV94EZwGGNFyJie2AecFL5/caIeGH5+eCIWBIRqyLijoj4dHn9eRFxa/OG26x3cUTcHRHLI+I/I+Jx7RrUfLcXETtGxNllvZUR8d8RsVmb7X8kIk6PiJMiYnV5VD27aZsHRsSV5b0zIuK0dneUEfE04ARgTkTcExF3N729fUT8qGzj0oh4ctN6AxFxbmnj0oh4TacOj4i9IuLnZTvnAju2vH9GeQrxp4i4MCL2K68fB7wReG9p2w/L6++LiN+V7V0XEX/ZqW5pojKENSmllNYApwPHNr38GmAwpfSrNqt8FvhsSmkb4Mll3W48DLyTHDhzgCOAt3Wx3ruBW4GdgCcA7wc6/Y3ZlwPfBrYDzgL+E6CE/Znku/4ZwLeAtkGVUroe+Fvg4pTSViml7Zrefj3wUWB74LfAJ8r2twTOBU4Fdi7lvtgIzzZOBS4n98XHgTe1vP9j4KllW1cAp5S2nVh+/n+lbS8r5X9HvojatrTv5IjYtUPd0oRkCGsy+wZwdERsUX4/trzWzlrgKRGxY0rpnpTSJd1UkFK6PKV0SUrpoZTSjcCXgcO7WHUtsCuwZ0ppbUrpv1PnP/R+UUppUUrpYeCbwDPL64cCU4HPlW
\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x=df.iloc[:,3:4].values\n",
"# dendrogram\n",
"\n",
"plt.figure(figsize =(8, 8))\n",
"plt.title('Visualising the data')\n",
"Dendrogram = shc.dendrogram((shc.linkage(x, method ='single')))"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"model1=AgglomerativeClustering(n_clusters=3,affinity=\"euclidean\",linkage=\"single\")\n",
"pred=model1.fit_predict(x)"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize =(6, 6))\n",
"plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], \n",
" c = model1.fit_predict(x), cmap ='rainbow')\n",
"plt.title(\"Using Min method\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here