Apache Spark: Movie Recommendation Engine Dataset: MovieLens Output:Jupyter Notebook (please display the output results) I think I got the majority of the programming completed but my spark crash and...

1 answer below »


Apache Spark:Movie Recommendation Engine



Dataset:MovieLens



Output:Jupyter Notebook (please display the output results)






I think I got the majority of the programming completed but my spark crash and Iam trying to fix it.



Requirements for exercise:


In this exercise, you will create a movie recommendation engine from the MovieLens data. You can download that data by clicking here:MovieLens.




1. Movie Recommendation Engine




a. Prepare Data


Load the data from the ratings.csv and movies.csv files and combine them on movieId. The resultant data set should contain all of the user ratings and include movie titles. The schema should look something like this.




b. Train Recommender



Using the data you prepared in the last step, create a movie recommendation model using collaborative filtering.Spark’s collaborative filtering documentationprovides a template for building and testing this model.



Before you train the recommendation model, split the data into a training dataset and a testing dataset using the randomSplit dataframe method. Use 80% of your data for training and 20% for testing.



After fitting your model using the training dataset, calculate the predictions on the test dataset and use the RegressionEvaluator to calculate the root-mean-square error of the model.



As a reminder,Spark’s collaborative filtering documentationwill be helpful in completing this task.




c. Generate top 10 movie recommendations





Using the recommendation model, generate the top ten recommendations for each user. Using the show method, print the recommendations for the user IDs, 127, 151, and 300. You should not truncate the results and so should call the show method like this recommendations_127.show(truncate=False).


Answered Same DayNov 08, 2021

Answer To: Apache Spark: Movie Recommendation Engine Dataset: MovieLens Output:Jupyter Notebook (please...

Neha answered on Nov 11 2021
156 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
user_idmovieIdratingtimestamp
0114.0964982703
1134.0964981247
2</td>\n",
"
164.0964982224
31475.0964983815
41505.0964982931
\n",
"
"
],
"text/plain": [
" user_id movieId rating timestamp\n",
"0 1 1 4.0 964982703\n",
"1 1 3 4.0 964981247\n",
"2 1 6 4.0 964982224\n",
"3 1 47 5.0 964983815\n",
"4 1 50 5.0 964982931"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
" \n",
"# Get the data \n",
"column_names = ['user_id', 'movieId', 'rating', 'timestamp'] \n",
" \n",
"path = 'ratings.csv'\n",
" \n",
"df = pd.read_csv(path, sep=',', names=column_names) \n",
" \n",
"# Check the head of the data \n",
"df.head() \n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
titlegenres
movieId
1Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
2Jumanji (1995)Adventure|Children|Fantasy
3Grumpier Old Men (1995)Comedy|Romance
4Waiting to Exhale (1995)Comedy|Drama|Romance
5Father of the Bride Part II (1995)Comedy
\n",
"
"
],
"text/plain": [
" title \\\n",
"movieId \n",
"1 Toy Story (1995) \n",
"2 Jumanji (1995) \n",
"3 Grumpier Old Men (1995) \n",
"4 Waiting to Exhale (1995) \n",
"5 Father of the Bride Part II (1995) \n",
"\n",
" genres \n",
"movieId \n",
"1 Adventure|Animation|Children|Comedy|Fantasy \n",
"2 Adventure|Children|Fantasy \n",
"3 Comedy|Romance \n",
"4 Comedy|Drama|Romance \n",
"5 Comedy "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movie_titles = pd.read_csv('movies.csv',index_col=0) \n",
"movie_titles.head() "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
user_idmovieIdratingtimestamptitlegenres
0114.0964982703Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
1514.0847434962Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
2714.51106635946Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
31512.51510577970Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
41714.51305696483Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
\n",
"
"
],
"text/plain": [
" user_id movieId rating timestamp title \\\n",
"0 1 1 4.0 964982703 Toy Story (1995) \n",
"1 5 1 4.0 847434962 Toy Story (1995) \n",
"2 7 1 4.5 1106635946 Toy Story (1995) \n",
"3 15 1 2.5 1510577970 Toy Story (1995) \n",
"4 17 1 4.5 1305696483 Toy Story (1995) \n",
"\n",
" genres \n",
"0 Adventure|Animation|Children|Comedy|Fantasy \n",
"1 Adventure|Animation|Children|Comedy|Fantasy \n",
"2 Adventure|Animation|Children|Comedy|Fantasy \n",
"3 Adventure|Animation|Children|Comedy|Fantasy \n",
"4 Adventure|Animation|Children|Comedy|Fantasy "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.merge(df, movie_titles, on='movieId') \n",
"data.head() "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"title\n",
"Karlson Returns (1970) 5.0\n",
"Winter in Prostokvashino (1984) 5.0\n",
"My Love (2006) 5.0\n",
"Sorority House Massacre II (1990) 5.0\n",
"Winnie the Pooh and the Day of Concern (1972) 5.0\n",
"Sorority House Massacre (1986) 5.0\n",
"Bill Hicks: Revelations (1993) 5.0\n",
"My Man Godfrey (1957) 5.0\n",
"Hellbenders (2012) 5.0\n",
"In the blue sea, in the white foam. (1984) 5.0\n",
"Name: rating, dtype: float64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('title')['rating'].mean().sort_values(ascending=False).head(10) "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"title\n",
"Forrest Gump (1994) 329\n",
"Shawshank Redemption, The (1994) 317\n",
"Pulp Fiction (1994) 307\n",
"Silence of the Lambs, The (1991) 279\n",
"Matrix, The (1999) 278\n",
"Star Wars: Episode IV - A New Hope (1977) 251\n",
"Jurassic Park (1993) 238\n",
"Braveheart (1995) 237\n",
"Terminator 2: Judgment Day (1991) 224\n",
"Schindler's List (1993) 220\n",
"Name: rating, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('title')['rating'].count().sort_values(ascending=False).head(10) "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
ratingnum of ratings
title
'71 (2014)4.0000001
'Hellboy': The Seeds of Creation (2004)4.0000001
'Round Midnight (1986)3.5000002
'Salem's Lot (2004)5.0000001
'Til There Was You (1997)4.0000002
'Tis the Season for Love (2015)1.5000001
'burbs, The (1989)