Netflix because you watched feature
Where are we at?
This is what we did so far
- In part 0, we downloaded our data from MovieLens, did some EDA and created our user item matrix. The matrix has 671 unique users, 9066 unique movies and is 98.35% sparse
- In part 1, we described 3 of the most common recommendation methods: User Based Collaborative Filtering, Item Based Collaborative Filtering and Matrix Factorization
- In part 2, we implemented Matrix Factorization through ALS and found similar movies
- In part 3, this part, we recommend movies to users based on what movies theyâve rated. We also make an attempt to clone Netflixâs âbecause you watched Xâ feature and make a complete page recommendation with trending movies
Recommending Movies to users
We pick up our code where we trained the ALS model from implicit library. Previous code to load and process the data can be found in the previous posts in this series or on my Github.
model = implicit.als.AlternatingLeastSquares(factors=10,iterations=20,regularization=0.1,num_threads=4)
model.fit(user_item.T)
First letâs write a function that returns the movies that a particular user had rated
def get_rated_movies_ids(user_id, user_item, users, movies):âââInput
user_id: intUser ID
user_item: scipy.Sparse MatrixUser item interaction matrix
users: np.arrayMapping array between user ID and index in the user item matrix
movies: np.arrayMapping array between movie ID and index in the user item matrix
Output
movieTableIDs: python listList of movie IDs that the user had rated
âââuser_id = users.index(user_id)
Get matrix ids of rated movies by selected user
ids = user_item[user_id].nonzero()[1]
Convert matrix ids to movies IDs
movieTableIDs = [movies[item] for item in ids]
return movieTableIDs
movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)rated_movies = pd.DataFrame(movieTableIDs, columns=[âmovieIdâ])rated_movies
def get_movies(movieTableIDs, movies_table):âââInput
movieTableIDs: python listList of movie IDs that the user had rated
movies_table: pd.DataFrameDataFrame of movies info
Output
rated_movies: pd.DataFrameDataFrame of rated movies
âââ
rated_movies = pd.DataFrame(movieTableIDs, columns=[âmovieIdâ])
rated_movies = pd.merge(rated_movies, movies_table, on=âmovieIdâ, how=âleftâ)
return rated_movies
movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)df = get_movies(movieTableIDs, movies_table)df
Now, letâs recommend movieIDs for a particular user ID based on the movies that they rated.
def recommend_movie_ids(user_id, model, user_item, users, movies, N=5):âââInput
user_id: intUser ID
model: ALS modelTrained ALS model
user_item: sp.Sparse MatrixUser item interaction matrix so that we do not recommend already rated movies
users: np.arrayMapping array between User ID and user item index
movies: np.arrayMapping array between Movie ID and user item index
N: int (default =5)Number of recommendations
Output
movies_ids: python listList of movie IDsâââ
user_id = users.index(user_id)
recommendations = model.recommend(user_id, user_item, N=N)
recommendations = [item[0] for item in recommendations]
movies_ids = [movies[ids] for ids in recommendations]
return movies_ids
movies_ids = recommend_movie_ids(1, model, user_item, users, movies, N=5)movies_ids
> [1374, 1127, 1214, 1356, 1376]
movies_rec = get_movies(movies_ids, movies_table)movies_rec
display_posters(movies_rec)
movies_ids = recommend_movie_ids(100, model, user_item, users, movies, N=7)movies_rec = get_movies(movies_ids, movies_table)display_posters(movies_rec)
Because You watched
Letâs implement Netflix âBecause You Watchedâ feature. Itâs about recommending movies based on what youâve watched. This is similar to what we already did, but this time, itâs more selective. Hereâs how we will do it: We will choose random 5 movies that a user had watched and for each movie recommend similar movies to it. Finally, we display all of them in a one page layout
def similar_items(item_id, movies_table, movies, N=5):âââInput-----item_id: intMovieID in the movies table
movies\_table: DataFrame
DataFrame with movie ids, movie title and genre
movies: np.array
Mapping between movieID in the movies\_table and id in the item user matrix
N: int
Number of similar movies to return
Output
-----
df: DataFrame
DataFrame with selected movie in first row and similar movies for N next rows
âââ
# Get movie user index from the mapping array
user\_item\_id = movies.index(item\_id)
# Get similar movies from the ALS model
similars = model.similar\_items(user\_item\_id, N=N+1)
# ALS similar\_items provides (id, score), we extract a list of ids
l = \[item\[0\] for item in similars\[1:\]\]
# Convert those ids to movieID from the mapping array
ids = \[movies\[ids\] for ids in l\]
# Make a dataFrame of the movieIds
ids = pd.DataFrame(ids, columns=\[âmovieIdâ\])
# Add movie title and genres by joining with the movies table
recommendation = pd.merge(ids, movies\_table, on=âmovieIdâ, how=âleftâ)
return recommendation
def similar_and_display(item_id, movies_table, movies, N=5):
df = similar\_items(item\_id, movies\_table, movies, N=N)
df.dropna(inplace=True)
display\_posters(df)
def because_you_watched(user, user_item, users, movies, k=5, N=5):âââInput-----user: intUser ID
user\_item: scipy sparse matrix
User item interaction matrix
users: np.array
Mapping array between User ID and user item index
movies: np.array
Mapping array between Movie ID and user item index
k: int
Number of recommendations per movie
N: int
Number of movies already watched chosen
âââ
movieTableIDs = get\_rated\_movies\_ids(user, user\_item, users, movies)
df = get\_movies(movieTableIDs, movies\_table)
movieIDs = random.sample(df.movieId, N)
for movieID in movieIDs:
title = df\[df.movieId == movieID\].iloc\[0\].title
print(âBecause youâve watched â, title)
similar\_and\_display(movieID, movies\_table, movies, k)
because_you_watched(500, user_item, users, movies, k=5, N=5)
âBecause you watched â, âDefinitely, Maybe (2008)â
âBecause you watched â, âPocahontas (1995)â
âBecause you watched â, âSimpsons Movie, The (2007)â
âBecause you watched â, âCatch Me If You Can (2002)â
âBecause you watched â, âRisky Business (1983)â
Trending movies
Letâs also implement trending movies. In our context, trending movies are movies that been rated the most by users
def get_trending(user_item, movies, movies_table, N=5):âââInput
user_item: scipy sparse matrixUser item interaction matrix to use to extract popular movies
movies: np.arrayMapping array between movieId and ID in the user_item matrix
movies_table: pd.DataFrameDataFrame for movies information
N: intTop N most popular movies to return
âââ
binary = user_item.copy()binary[binary !=0] = 1
populars = np.array(binary.sum(axis=0)).reshape(-1)
movieIDs = populars.argsort()[::-1][:N]
movies_rec = get_movies(movieIDs, movies_table)
movies_rec.dropna(inplace=True)
print(âTrending Nowâ)
display_posters(movies_rec)
get_trending(user_item, movies, movies_table, N=6)
Trending Now
Page recommendation
Letâs put everything in a timeline method. The timeline method will get the user ID and display trending movies and recommendations based on similar movies that that user had watched.
def my_timeline(user, user_item, users, movies, movies_table, k=5, N=5):
get\_trending(user\_item, movies, movies\_table, N=N)
because\_you\_watched(user, user\_item, users, movies, k=k, N=N)
my_timeline(500, user_item, users, movies, movies_table, k=5, N=5)
Trending Now
âBecause you watched â, âDefinitely, Maybe (2008)â
âBecause you watched â, âPocahontas (1995)â
âBecause you watched â, âSimpsons Movie, The (2007)â
âBecause you watched â, âCatch Me If You Can (2002)â
âBecause you watched â, âRisky Business (1983)â
Export trained models to be used in production
At this point, we want to get our model into production. We want to create a web service where a user will provide a userid to the service and the service will return all of the recommendations including the trending and the âbecause youâve watchedâ.
To do that, We first export the trained model and the used data for use in the web service.
import scipy.sparse
scipy.sparse.save_npz(âmodel/user_item.npzâ, user_item)
np.save(âmodel/movies.npyâ, movies)np.save(âmodel/users.npyâ, users)movies_table.to_csv(âmodel/movies_table.csvâ, index=False)
from sklearn.externals import joblibjoblib.dump(model, âmodel/model.pklâ)
Conclusion
In this post, we recommend movies to users based on their movie rating history. From there, we tried to clone the âbecause you watchedâ feature from Netflix and also display Trending movies as movies that were rated the most number of times. In the next post, we will try to put our work in a web service, where a user requests movie recommendations by providing its user ID.
Stay tuned!