CptS 315: Introduction to Data Mining Programming Assignment 2 (PA2) Instructions • Supported programming languages: Python, Java, C++ • Store all the relevant ﬁles in a folder and submit the...

CptS 315: Introduction to Data Mining Programming Assignment 2 (PA2)

Instructions • Supported programming languages: Python, Java, C++ • Store all the relevant ﬁles in a folder and submit the corresponding zipﬁle (.zip). • This folder should have a script ﬁle named run_code.sh

Executing this script should do all the necessary steps required for executing the code including compiling, linking, and execution • Assume relative ﬁle paths in your code. Some examples: ‘‘./filename.txt’’ or ‘‘../hw1/filename.txt’’

• You should submit your zipﬁle to Blackboard by the stated due date.

Programming Assignment Explanation

Movie Recommendations via Item-Item Collaborative Filtering. You are provided with real-data (Movie-Lens dataset) of user ratings for diﬀerent movies. There is a readme ﬁle that describes the data format. In this project, you will implement the item-item collaborative ﬁltering algorithm that we discussed in the class. The high-level steps are as follows:

a) Construct the proﬁle of each item (i.e., movie). At the minimum, you should use the ratings given by each user for a given item (i.e., movie). Optionally, you can use other information (e.g., genre information for each movie and tag information given by user for each movie) creatively. If you use this additional information, you should explain your methodology in the submitted report.

b) Compute similarity score for all item-item (i.e., movie-movie) pairs. You will employ the centered cosine similarity metric that we discussed in class.

c) Compute the neighborhood set N for each item (i.e. movie). You will select the movies that have highest similarity score for the given movie. Please employ a neigborhood of size 5. Break ties using lexicographic ordering over movie-ids.

d) Estimate the ratings of other users who didn’t rate this item (i.e., movie) using the neighborhood set. Repeat for each item (i.e., movie).

e) Compute the recommended items (movies) for each user. Pick the top-5 movies with highest estimated ratings. Break ties using lexicographic ordering over movie-ids.

Your program should output top-5 recommendations for each user.

Output Format: • The output of your program should be dumped in a ﬁle named “output.txt” in the following format. One line for each user.

User-id1 movie-id1 movie-id2 movie-id3 movie-id4 movie-id5 User-id2 movie-id1 movie-id2 movie-id3 movie-id4 movie-id5 ··· ···

Explanation.

– Line 1 should have the ﬁrst user-id followed by the movie-ids of recommended movies. – Line 2 should have the second user-id followed by the movie-ids of recommended movies. • Make sure the output.txt ﬁle is dumped when you execute the script run_code.sh

May 18, 2022

SOLUTION.PDF

CptS 315: Introduction to Data Mining Programming Assignment 2 (PA2) Instructions • Supported programming languages: Python, Java, C++ • Store all the relevant ﬁles in a folder and submit the...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment