In this tutorial, we will guide you through the process of building a movie recommendation system using various recommendation techniques. We will implement both content-based filtering and collaborative filtering to provide movie suggestions based on user preferences and movie attributes. We will use Python and popular libraries like pandas and scikit-learn to manipulate data and calculate similarity scores. Let’s get started!
First, let’s import the required libraries and load the MovieLens dataset. The MovieLens dataset contains movie ratings and movie metadata, which we will use to build our recommendation system.
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
# Load MovieLens dataset
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
# Merge ratings and movies dataframes
movie_ratings = pd.merge(ratings, movies, on='movieId')
Content-based filtering recommends items similar to the ones the user has interacted with in the past. In our case, we will use movie genres as the content attributes for recommending similar movies.
Creating a Genre Matrix
First, we’ll create a genre matrix using one-hot encoding to represent each movie’s genres.
# Create a genre matrix using one-hot encoding
genre_matrix = movie_ratings['genres'].str.get_dummies('|')
Calculating Cosine Similarity
Next, we will calculate the cosine similarity between movies based on their genres. Cosine similarity measures the similarity between two non-zero vectors in an inner product space.
# Calculate the cosine similarity between movies based on genres
cosine_sim = cosine_similarity(genre_matrix, genre_matrix)
# Create a dataframe with movie titles as index and movieId as columns
movie_similarity = pd.DataFrame(cosine_sim, index=movies['title'], columns=movies['title'])