Question 1
Looking for minimum distance between particles
In this question you will look for the minimum distance between two particles in a set of particles scattered randomly in 2D space.
First generate a set of random co-ordinates (x,y) for N particles, where x and y are between 0 and 1. Next:
Part 1: write a CPU serial function which tries all the possible distances between pairs of particles and finds the minimum distance.
Part 2: similarly write a GPU CUDA function which finds the minimum distance, using one thread per particle. Choose a reduction method for finding the minimum which is efficient.
Par 3: similarly write a GPU CUDA function which finds the minimum distance, using one thread per each pair of particles. Choose a reduction method for finding the minimum which is efficient.
Compare and comment on the efficiency of approaches a, b, and c.
Part 4: Discuss what improvements could be made to the simple algorithm to improve the speed of the calculation.
Note: for each part a,b,c , increase N until your code for each part takes at most 30 seconds to run. Because code in each part will run at different speeds, the value of N will be different.
Question 2
Working with Spark
This is the last assignment question of the course and it is more loosely formulated. If you need more guidance about what you to do here, please use the discussion forum to ask the instructor.
Find some text data online that is of interest to you and download it. This can be a ready-made dataset, or you can use your own scraper program to collect the data. Try to keep the total data set size limited to less than 0.5 Gb by truncating as needed.
Once you have the data, use Spark read it in as an RDD and perform some analysis similar to what you have seen in Lesson 11 and 12.
Submit question 2 as a Jupyter Notebook file that includes your source code, any figures you might have, plus a text description of what you are doing.