We will use a Kaggle dataset that shows different socio-economic & health factors for different countries. The dataset can be downloaded at this link: https://www.kaggle.com/rohan0301/unsupervised-learning-on-country-data(Links to an external site.) Download the file, open it and check what the different columns represent.Lab Instructions KMEANSWrite a python script that clusters the countries using KMeans. Your code shoud: Load the data file into a data frame. Separate the first column (which holds the country name) from the rest of the columns. Run KMeans on the other columns. Set K = 2. Extract the resulting cluster ID's as a list and appending them side by side to the country names. The output from this step should be a dataframe that contains two columns and it should look something like this: Country ClusterID Afghanistan 0 Albania 1 ... 0 Sort this dataframe by ClusterID & save it to file as a csv file. Open this csv file & see which countries are grouped together into the same cluster.Do you notice anything interesting about how the different countries are grouped into clusters?Repeat the clustering process for k = 3, 4 & 6.For each value of K, show the resulting country clusters, sorted by cluster ID. Agglomerative Repeat the activity above using Agglomerative Clustering. You still need to set the number of clusters through the n_clusters parameter. DBSCAN Run the DBSCAN clustering method with epsilon = 800 and min_samples = 3. Check the output file. Notice that when the cluster ID is -1, this means that this is a noise point that is not assigned to any cluster. What is the number of obtained clusters? How many data points are considered as noise? How do you assess the obtained clustering results as compared to the output generated by KMEANS & Agglomerative Clustering? Re-run DBSCAN for epsilon values: [700, 800,900] and min_samples values: [2,3]. You can do that by writing two nested for loops of this form:for eps in range(700,1000,100):for min_samples in range(2,4,1):run DBSCAN for eps & min_samples Are the obtained results for these values different or similar to each other?Resources SciKit Learn Documentation Pages KMeans: https://scikit-learn.org/stable/modules/clustering.html#k-means(Links to an external site.) https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html(Links to an external site.) Hierarchical (Agglomerative): https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering(Links to an external site.) https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering(Links to an external site.) DBSCAN: https://scikit-learn.org/stable/modules/clustering.html#dbscan(Links to an external site.) https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html?highlight=clustering%20dbscan(Links to an external site.)

We will use a Kaggle dataset that shows different socio-economic & health factors for different countries. The dataset can be downloaded at this link:...

We will use a Kaggle dataset that shows different socio-economic & health factors for different countries. The dataset can be downloaded at this link:

https://www.kaggle.com/rohan0301/unsupervised-learning-on-country-data(Links to an external site.)

Download the file, open it and check what the different columns represent.

Lab Instructions

KMEANS

Write a python script that clusters the countries using KMeans. Your code shoud:

Load the data file into a data frame.

Separate the first column (which holds the country name) from the rest of the columns.

Run KMeans on the other columns. Set K = 2.

Extract the resulting cluster ID's as a list and appending them side by side to the country names. The output from this step should be a dataframe that contains two columns and it should look something like this:

Country ClusterID
Afghanistan 0
Albania 1
... 0

Sort this dataframe by ClusterID & save it to file as a csv file.

Open this csv file & see which countries are grouped together into the same cluster.

Do you notice anything interesting about how the different countries are grouped into clusters?

Repeat the clustering process for k = 3, 4 & 6.

For each value of K, show the resulting country clusters, sorted by cluster ID.

Agglomerative

Repeat the activity above using Agglomerative Clustering. You still need to set the number of clusters through the n_clusters parameter.

DBSCAN

Run the DBSCAN clustering method with epsilon = 800 and min_samples = 3.

Check the output file. Notice that when the cluster ID is -1, this means that this is a noise point that is not assigned to any cluster.

What is the number of obtained clusters?

How many data points are considered as noise?

How do you assess the obtained clustering results as compared to the output generated by KMEANS & Agglomerative Clustering?

Re-run DBSCAN for epsilon values: [700, 800,900] and min_samples values: [2,3]. You can do that by writing two nested for loops of this form:

for eps in range(700,1000,100):
for min_samples in range(2,4,1):
run DBSCAN for eps & min_samples

Are the obtained results for these values different or similar to each other?

Resources

SciKit Learn Documentation Pages

KMeans:
- https://scikit-learn.org/stable/modules/clustering.html#k-means(Links to an external site.)
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html(Links to an external site.)

Hierarchical (Agglomerative):
- https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering(Links to an external site.)
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering(Links to an external site.)

DBSCAN:
- https://scikit-learn.org/stable/modules/clustering.html#dbscan(Links to an external site.)
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html?highlight=clustering%20dbscan(Links to an external site.)

Apr 12, 2022

SOLUTION.PDF

We will use a Kaggle dataset that shows different socio-economic & health factors for different countries. The dataset can be downloaded at this link:...

Lab Instructions

KMEANS

Agglomerative

Resources

SciKit Learn Documentation Pages

Get Answer To This Question

Related Questions & Answers

Submit New Assignment