You and Martha have done your research. You understand what unsupervised learning is used for, how...

Question

You and Martha have done your research. You understand what unsupervised learning is used for, how to process data, how to cluster, how to reduce your dimensions, and how to reduce the principal components using PCA. It’s time to put all these skills to use by creating an analysis for your clients who are preparing to get into the cryptocurrency market.

Martha is a senior manager for the Advisory Services Team at Accountability Accounting, one of your most important clients. Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. So, they’ve asked you to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment.

The data Martha will be working with is not ideal, so it will need to be processed to fit the machine learning models. Since there is no known output for what Martha is looking for, she has decided to use unsupervised learning. To group the cryptocurrencies, Martha decided on a clustering algorithm. She’ll use data visualizations to share her findings with the board.

What You're Creating

This new assignment consists of four technical analysis deliverables. You will submit the following:

Deliverable 1: Preprocessing the Data for PCA

Deliverable 2: Reducing Data Dimensions Using PCA

Deliverable 3: Clustering Cryptocurrencies Using K-means

Deliverable 4: Visualizing Cryptocurrencies Results

Files

Use the following links to download the dataset and Challenge starter code.

Download cryptocurrency data (crypto_data.csv)(Links to an external site.)

Download crypto clustering starter code(Links to an external site.)

Deliverable 1: Preprocessing the Data for PCA (30 points)

Deliverable 1 Instructions

Using your knowledge of Pandas, you’ll preprocess the dataset in order to perform PCA in Deliverable 2.

REWIND

For this deliverable, you’ve already done the following in this module:

Lesson 18.2.1:
Review the steps to prepare data

Lesson 18.2.2:
Pandas refresher

Lesson 18.2.3:
Preprocessing data with Pandas

Lesson 18.2.4:
Data selection

Lesson 18.2.5:
Preprocessing data

Lesson 18.5.2:
Use the StandardScaler library to standardize features

Follow the instructions below and use thecrypto_clustering_starter_code.ipynbfile to complete Deliverable 1.

Open thecrypto_clustering_starter_code.ipynbfile, rename itcrypto_clustering.ipynb, and save it to your Cryptocurrencies GitHub folder.

Read in thecrypto_data.csvto the Pandas DataFrame namedcrypto_df.

NOTE

Thecrypto_data.csvwas retrieved fromCryptoCompare(Links to an external site.).

Keep all the cryptocurrencies that are being traded.

Drop theIsTradingcolumn.

Remove rows that have at least one null value.

Filter thecrypto_dfDataFrame so it only has rows where coins have been mined.

Create a new DataFrame that holds only the cryptocurrency names, and use thecrypto_dfDataFrame index as the index for this new DataFrame.

Remove theCoinNamecolumn from thecrypto_dfDataFrame since it's not going to be used on the clustering algorithm.

Take a moment to check that yourcrypto_dfDataFrame looks like the image below:

The crypto_df DataFrame shows four columns: Algorithm, ProofType, TotalCoinsMined, TotalCoinSupply. It contains ten rows with the following headings: 42, 404, 1337, BTC, ETH, LTC, DASH, XMR, ETC, and ZEC

Use theget_dummies()method to create variables for the two text features,AlgorithmandProofType, and store the resulting data in a new DataFrame namedX.

Use the StandardScalerfit_transform()function to standardize the features from theXDataFrame.

IMPORTANT

Using theStandardScaler()sklearn library to standardize the features is required before attempting Deliverables 2 and 3.

Save yourcrypto_clustering.ipynbfile to your Cryptocurrencies folder.

Deliverable 1 Requirements

You will earn a perfect score for Deliverable 1 by completing all requirements below:

The following five preprocessing steps have been performed on thecrypto_dfDataFrame:
- All cryptocurrencies that are not being traded are removed(3 pt)
- TheIsTradingcolumn is dropped(3 pt)
- All the rows that have at least one null value are removed(3 pt)
- All the rows that do not have coins being mined are removed(3 pt)
- TheCoinNamecolumn is dropped(3 pt)

A new DataFrame is created that stores all cryptocurrency names from theCoinNamecolumn and retains the index from thecrypto_dfDataFrame(5 pt)

Theget_dummies()method is used to create variables for the text features, which are then stored in a new DataFrame,X
(5 pt)

The features from theXDataFrame have been standardized using the StandardScalerfit_transform()function(5 pt)

Deliverable 2: Reducing Data Dimensions Using PCA (20 points)

Deliverable 2 Instructions

Using your knowledge of how to apply the Principal Component Analysis (PCA) algorithm, you’ll reduce the dimensions of theXDataFrame to three principal components and place these dimensions in a new DataFrame.

REWIND

For this deliverable, you’ve already done the following in this module:

Lesson 18.5.2:
Apply Principal Component Analysis

Follow the instructions below and use the information in thecrypto_clustering_starter_code.ipynbfile to complete Deliverable 2.

Continue using thecrypto_clustering.ipynbfile from Deliverable 1 where you’ve already performed the preprocessing steps.

Using the information we’ve provided, apply PCA to reduce the dimensions to three principal components.

If you’d like a hint on how to use the PCA algorithm, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Create a new DataFrame namedpcs_dfthat includes the following columns,PC 1,PC 2, andPC 3, and uses the index of thecrypto_dfDataFrame as the index.

Your DataFrame should look like the image below:

data-Module-18-Challenge-1-clustering-cryptocurrencies-using-k-means.png

Save yourcrypto_clustering.ipynbfile to your Cryptocurrencies folder.

Deliverable 2 Requirements

You will earn a perfect score for Deliverable 2 by completing all requirements below:

The PCA algorithm reduces the dimensions of theXDataFrame down to three principal components(10 pt)

Thepcs_dfDataFrame is created and has the following three columns,PC 1,PC 2, andPC 3, and has the index from thecrypto_dfDataFrame(10 pt)

Deliverable 3: Clustering Cryptocurrencies Using K-means (20 points)

Deliverable 3 Instructions

Using your knowledge of the K-means algorithm, you’ll create an elbow curve usinghvPlotto find the best value for K from thepcs_dfDataFrame created in Deliverable 2. Then, you’ll run the K-means algorithm to predict the K clusters for the cryptocurrencies’ data.

REWIND

For this deliverable, you’ve already done the following in this module:

Lesson 18.3.2:
Create an instance of the K-means algorithm and make predictions

Lesson 18.4.2:
Create an elbow curve usinghvPlot

Lesson 18.5.2:
Apply Principal Component Analysis

Follow the instructions below and use the information in thecrypto_clustering_starter_code.ipynbfile to complete Deliverable 3.

Continue using thecrypto_clustering.ipynbfile that you used in Deliverable 2 to reduce the dataset to three dimensions.

Using thepcs_dfDataFrame, create an elbow curve usinghvPlotto find the best value for K.

Next, use thepcs_dfDataFrame to run the K-means algorithm to make predictions of the K clusters for the cryptocurrencies’ data.

If you’d like a hint on how to use the K-means algorithm, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Create a new DataFrame namedclustered_dfby concatenating thecrypto_dfandpcs_dfDataFrames on the same columns. The index should be the same as thecrypto_dfDataFrame.

Add theCoinNamecolumn that holds the names of the cryptocurrencies, which you created in Step 7 of Deliverable 1, to theclustered_df.

Add another new column to theclustered_dfnamedClassthat holds the predictions, i.e.,model.labels_, from Step 3.

Yourclustered_dfDataFrame should look like the image below:

The DataFrame shows nine columns: Algorithm, ProofType,TotalCoinsMined, TotalCoinSupply, PC1, PC 2, PC 3, CoinName, and Class.It contains ten rows with the following headings: 42, 404, 1337, BTC, ETH, LTC, DASH, XMR, ETC, and ZEC.

Save yourcrypto_clustering.ipynbfile to your Cryptocurrencies folder.

Deliverable 3 Requirements

You will earn a perfect score for Deliverable 3 by completing all requirements below:

The K-means algorithm is used to cluster the cryptocurrencies using the PCA data, where the following steps have been completed:
- An elbow curve is created usinghvPlotto find the best value for K(10 pt)
- Predictions are made on the K clusters of the cryptocurrencies’ data(5 pt)
- A new DataFrame is created with the same index as thecrypto_dfDataFrame and has the following columns:Algorithm,ProofType,TotalCoinsMined,TotalCoinSupply,PC 1,PC 2,PC 3,CoinName, andClass
  (5 pt)

Deliverable 4: Visualizing Cryptocurrencies Results (30 points)

Deliverable 4 Instructions

Using your knowledge of creating scatter plots with Plotly Express andhvplot, you’ll visualize the distinct groups that correspond to the three principal components you created in Deliverable 2, then you’ll create a table with all the currently tradable cryptocurrencies using thehvplot.table()function.

REWIND

For this deliverable, you’ve already done the following in this module:

Lesson 18.3.2:
Create a scatter plot usinghvplot

Lesson 18.4.2:
Create a 3D scatter plot with Plotly Express

Follow the instructions below and use the information in thecrypto_clustering_starter_code.ipynbfile to complete Deliverable 4.

Continue using thecrypto_clustering.ipynbfile from Deliverable 3 where you have predicted the K clusters for the cryptocurrencies’ data.

Create a 3D scatter plot using the Plotly Expressscatter_3d()function to plot the three clusters from theclustered_dfDataFrame.

Add theCoinNameandAlgorithmcolumns to thehover_nameandhover_dataparameters, respectively, so each data point shows the CoinName and Algorithm on hover.

If you’d like a hint on how to add additional parameters to a Plotly Express 3D scatter plot, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Create a table with tradable cryptocurrencies using thehvplot.table()function.

If you’d like a hint on how to use thehvplot.table()function, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Your table should look like the table in the image below:

An hvplot.table() showing all the tradable cryptocurrencies with six columns: CoinName, Algorithm, ProofType, TotalCoinSupply, TotalCoinsMined, and Class

Print the total number of tradable cryptocurrencies in theclustered_dfDataFrame.

Use theMinMaxScaler().fit_transformmethod to scale theTotalCoinSupplyandTotalCoinsMinedcolumns between the given range of zero and one.

If you’d like a hint on how to use theMinMaxScaler().fit_transformmethod to scale the "TotalCoinSupply" and "TotalCoinsMined" columns, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Create a new DataFrame using theclustered_dfDataFrame index that contains the scaled data you created in Step 5.

Add theCoinNamecolumn from theclustered_dfDataFrame to the new DataFrame.

Add theClasscolumn from theclustered_dfDataFrame to the new DataFrame.

Your new DataFrame should look similar to the image below:

A tradable cryptocurrencies DataFrame showing four columns: , TotalCoinSupply, TotalCoinsMined, CoinName, and Class. It contains ten rows with the following headings: 42, 404, 1337, BTC, ETH, LTC, DASH, XMR, ETC, and ZEC.

Create anhvplotscatter plot with x="TotalCoinsMined", y="TotalCoinSupply", and by="Class", and have it show theCoinNamewhen you hover over the the data point.

If you’d like a hint on how to add theCoinNamecolumn data when you hover over a data point, that’s totally okay. If not, that’s great too. You can always revisit this later if you change your mind.

HINT

Your scatter plot should look similar to the image below:

A hvplot scatter plot with X-axis as the

Save yourcrypto_clustering.ipynbfile to your Cryptocurrencies folder.

Deliverable 4 Requirements

You will earn a perfect score for Deliverable 4 by completing all requirements below:

The clusters are plotted using a 3D scatter plot, and each data point shows the CoinName and Algorithm on hover(10 pt)

A table with tradable cryptocurrencies is created using thehvplot.table()function(3 pt)

The total number of tradable cryptocurrencies is printed(2 pt)

A DataFrame is created that contains theclustered_dfDataFrame index, the scaled data, and theCoinNameandClasscolumns(5 pt)

Ahvplotscatter plot is created where the X-axis is "TotalCoinsMined", the Y-axis is "TotalCoinSupply", the data is ordered by "Class", and it shows the CoinName when you hover over the data point(10 pt)

Submission

Once you’re ready to submit, make sure to check your work against the rubric to ensure you are meeting the requirements for this Challenge one final time. It’s easy to overlook items when you’re in the zone!

As a reminder, the deliverables for this Challenge are as follows:

Deliverable 1: Preprocessing the Data for PCA

Deliverable 2: Reducing Data Dimensions Using PCA

Deliverable 3: Clustering Cryptocurrencies Using K-means

Deliverable 4: Visualizing Cryptocurrencies Results

Upload the following to your Cryptocurrencies GitHub repository:

Yourcrypto_clustering.ipynbfile.

A README.md that includes the purpose of the repository and short description of what was accomplished. Although there is no graded written analysis for this challenge, it is encouraged and good practice to add a brief description of your project.

To submit your challenge assignment for grading in Bootcamp Spot, click

module-18-challenge-0otnj1cg-4xqmrar1.pdf

Suraj · Accepted Answer

Answer Attached Below:

You and Martha have done your research. You understand what unsupervised learning is used for, how to process data, how to cluster, how to reduce your dimensions, and how to reduce the principal...

What You're Creating

Files

Deliverable 1: Preprocessing the Data for PCA (30 points)

Deliverable 1 Instructions

Deliverable 1 Requirements

Deliverable 2: Reducing Data Dimensions Using PCA (20 points)

Deliverable 2 Instructions

Deliverable 2 Requirements

Deliverable 3: Clustering Cryptocurrencies Using K-means (20 points)

Deliverable 3 Instructions

Deliverable 3 Requirements

Deliverable 4: Visualizing Cryptocurrencies Results (30 points)

Deliverable 4 Instructions

Deliverable 4 Requirements

Submission

Answer To: You and Martha have done your research. You understand what unsupervised learning is used for, how...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment