Your project will not be graded if
You do not produce outputs for all the test cases by executing your source code
Your source code is not included in the required extensions (*.c, *.cpp, *.java, or *.py)
Your report (PDF file) does not include results for all test cases
Your report is in some other format than PDF
Problem 1 (30 points)
Perform the following by reading in two graphs (networks), G1 (Project3_G1_Data.csv) and G2 (Project3_G2_Data.csv):
- Design and implement an algorithm to merge these two input graphs into one graph (network), called G. Your algorithm must create a new adjacency matrix representing the merged graph (G) in two ways: one by using a two-dimensional array and another by using a linked-list.
- Implement Kruskal’s algorithm in the textbook to find a minimum-spanning tree in graph G.
- Print the computed minimum-spanning tree as you did in Quiz 6 and Quiz 7.
- Report on the memory usage by each of the two data structures.
Submission Items
- Three programs (one for merging graphs and two for finding a minimum-spanning tree, one for each data structure)
- A PDF file containing
- The result of the computed minimum-spanning tree and the total distance on it
- The analysis of the memory usage by each data structure
Problem 2 (70 points)
Design and implement your own algorithm to perform K-Means clustering. Your algorithm must implement methods for handling centroid initialization, calculating distance, assigning centroids, moving centroids, calculating loss, and stopping criteria (see Resources below). You must run your algorithm on two different datasets (see below). The first dataset, test case, is synthetic data designed for the purpose of checking the result of your algorithm before running your algorithm on the second dataset which is real data about electric power consumption.
Requirements for Test Case Dataset (
P
roject3_Test_
Case.csv
)
This dataset contains three variables: X1, X2, and Y. Use X1 and X2 to perform clustering. Use Y as the ground truth in your experiments. Run the algorithm with K=2 and report on the following:
- The coordinates of each centroid
- A count of the number of points assigned to each centroid
- The losses for each iteration of training
Requirements for Electric Power Consumption Dataset (
Project3_Power_Consump
tion.csv
)
This dataset provides samples for individual household electric power consumption. For a description of the dataset checkhere(Links to an external site.). Perform K-Means clustering using all variables in this dataset. Report on the following for the different values of K.
- For K=2
- The coordinates of each centroid
- A count of the number of points assigned to each centroid
- The losses for each iteration of training
- For K from 3 to 20
- A scatter plot containing the loss of the last iteration of training for each value of K
- The coordinates of each centroid for each value of K
- The number of points assigned to each centroid for each value of K
- Using the elbow method (see Resources) find the value of K which best fits the data
Additional Requirements
- Explain the stopping criteria you selected for your algorithm (see Resources) and how they may impact your results.
- Explain the centroid initialization method you selected and how that may impact your results.
- Explain why your algorithm may require different amounts of time to run on the same input and parameters.
- Explain why your algorithm may produce different results on the same input and parameters.
Resources
Project3_Resources.pdf
Submission Items
- Two programs (one for each dataset) that output the centroid coordinates, loss, and number of points assigned to each centroid by your algorithm for K=2
- A PDF file with the required information for each dataset and the responses to the additional requirements
Quiz 8 (March 18, 2021)" wfd-id="224" style="float: left;">Previous