paint-brush
Metflix: Because You Watched Xby@mohtedibf
9,651 reads
9,651 reads

Metflix: Because You Watched X

by Mohtadi Ben FrajMarch 31st, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This is what we did so far

Company Mentioned

Mention Thumbnail
featured image - Metflix: Because You Watched X
Mohtadi Ben Fraj HackerNoon profile picture

Netflix because you watched feature

Where are we at?

This is what we did so far

  • In part 0, we downloaded our data from MovieLens, did some EDA and created our user item matrix. The matrix has 671 unique users, 9066 unique movies and is 98.35% sparse
  • In part 1, we described 3 of the most common recommendation methods: User Based Collaborative Filtering, Item Based Collaborative Filtering and Matrix Factorization
  • In part 2, we implemented Matrix Factorization through ALS and found similar movies
  • In part 3, this part, we recommend movies to users based on what movies they’ve rated. We also make an attempt to clone Netflix’s “because you watched X” feature and make a complete page recommendation with trending movies

Recommending Movies to users

We pick up our code where we trained the ALS model from implicit library. Previous code to load and process the data can be found in the previous posts in this series or on my Github.




model = implicit.als.AlternatingLeastSquares(factors=10,iterations=20,regularization=0.1,num_threads=4)

model.fit(user_item.T)

First let’s write a function that returns the movies that a particular user had rated



def get_rated_movies_ids(user_id, user_item, users, movies):“””Input


user_id: intUser ID


user_item: scipy.Sparse MatrixUser item interaction matrix


users: np.arrayMapping array between user ID and index in the user item matrix


movies: np.arrayMapping array between movie ID and index in the user item matrix

Output


movieTableIDs: python listList of movie IDs that the user had rated


“””user_id = users.index(user_id)

Get matrix ids of rated movies by selected user

ids = user_item[user_id].nonzero()[1]

Convert matrix ids to movies IDs

movieTableIDs = [movies[item] for item in ids]

return movieTableIDs



movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)rated_movies = pd.DataFrame(movieTableIDs, columns=[‘movieId’])rated_movies



def get_movies(movieTableIDs, movies_table):“””Input


movieTableIDs: python listList of movie IDs that the user had rated


movies_table: pd.DataFrameDataFrame of movies info

Output


rated_movies: pd.DataFrameDataFrame of rated movies

“””

rated_movies = pd.DataFrame(movieTableIDs, columns=[‘movieId’])

rated_movies = pd.merge(rated_movies, movies_table, on=’movieId’, how=’left’)

return rated_movies



movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)df = get_movies(movieTableIDs, movies_table)df

Now, let’s recommend movieIDs for a particular user ID based on the movies that they rated.



def recommend_movie_ids(user_id, model, user_item, users, movies, N=5):“””Input


user_id: intUser ID


model: ALS modelTrained ALS model


user_item: sp.Sparse MatrixUser item interaction matrix so that we do not recommend already rated movies


users: np.arrayMapping array between User ID and user item index


movies: np.arrayMapping array between Movie ID and user item index


N: int (default =5)Number of recommendations

Output



movies_ids: python listList of movie IDs“””

user_id = users.index(user_id)

recommendations = model.recommend(user_id, user_item, N=N)

recommendations = [item[0] for item in recommendations]

movies_ids = [movies[ids] for ids in recommendations]

return movies_ids


movies_ids = recommend_movie_ids(1, model, user_item, users, movies, N=5)movies_ids

> [1374, 1127, 1214, 1356, 1376]


movies_rec = get_movies(movies_ids, movies_table)movies_rec

display_posters(movies_rec)



movies_ids = recommend_movie_ids(100, model, user_item, users, movies, N=7)movies_rec = get_movies(movies_ids, movies_table)display_posters(movies_rec)

Because You watched

Let’s implement Netflix “Because You Watched” feature. It’s about recommending movies based on what you’ve watched. This is similar to what we already did, but this time, it’s more selective. Here’s how we will do it: We will choose random 5 movies that a user had watched and for each movie recommend similar movies to it. Finally, we display all of them in a one page layout






def similar_items(item_id, movies_table, movies, N=5):“””Input-----item_id: intMovieID in the movies table

 movies\_table: DataFrame  
   DataFrame with movie ids, movie title and genre

 movies: np.array  
   Mapping between movieID in the movies\_table and id in the item user matrix

 N: int  
   Number of similar movies to return

 Output  
 -----  
 df: DataFrame  
   DataFrame with selected movie in first row and similar movies for N next rows  
 “””

# Get movie user index from the mapping array  
user\_item\_id = movies.index(item\_id)  
# Get similar movies from the ALS model  
similars = model.similar\_items(user\_item\_id, N=N+1)   
# ALS similar\_items provides (id, score), we extract a list of ids  
l = \[item\[0\] for item in similars\[1:\]\]  
# Convert those ids to movieID from the mapping array  
ids = \[movies\[ids\] for ids in l\]  
# Make a dataFrame of the movieIds  
ids = pd.DataFrame(ids, columns=\[‘movieId’\])  
# Add movie title and genres by joining with the movies table  
recommendation = pd.merge(ids, movies\_table, on=’movieId’, how=’left’)

return recommendation

def similar_and_display(item_id, movies_table, movies, N=5):

 df = similar\_items(item\_id, movies\_table, movies, N=N)

 df.dropna(inplace=True)

 display\_posters(df)






def because_you_watched(user, user_item, users, movies, k=5, N=5):“””Input-----user: intUser ID

 user\_item: scipy sparse matrix  
   User item interaction matrix

 users: np.array  
   Mapping array between User ID and user item index

 movies: np.array  
   Mapping array between Movie ID and user item index

 k: int  
   Number of recommendations per movie

 N: int  
   Number of movies already watched chosen

 “””

 movieTableIDs = get\_rated\_movies\_ids(user, user\_item, users, movies)  
   
 df = get\_movies(movieTableIDs, movies\_table)

 movieIDs = random.sample(df.movieId, N)

 for movieID in movieIDs:  
    title = df\[df.movieId == movieID\].iloc\[0\].title  
    print(“Because you’ve watched “, title)  
    similar\_and\_display(movieID, movies\_table, movies, k)

because_you_watched(500, user_item, users, movies, k=5, N=5)

“Because you watched “, ‘Definitely, Maybe (2008)’

“Because you watched “, ‘Pocahontas (1995)

“Because you watched “, ‘Simpsons Movie, The (2007)’

“Because you watched “, ‘Catch Me If You Can (2002)’

“Because you watched “, ‘Risky Business (1983)’

Trending movies

Let’s also implement trending movies. In our context, trending movies are movies that been rated the most by users



def get_trending(user_item, movies, movies_table, N=5):“””Input


user_item: scipy sparse matrixUser item interaction matrix to use to extract popular movies


movies: np.arrayMapping array between movieId and ID in the user_item matrix


movies_table: pd.DataFrameDataFrame for movies information


N: intTop N most popular movies to return

“””


binary = user_item.copy()binary[binary !=0] = 1

populars = np.array(binary.sum(axis=0)).reshape(-1)

movieIDs = populars.argsort()[::-1][:N]

movies_rec = get_movies(movieIDs, movies_table)

movies_rec.dropna(inplace=True)

print(“Trending Now”)

display_posters(movies_rec)

get_trending(user_item, movies, movies_table, N=6)

Trending Now

Page recommendation

Let’s put everything in a timeline method. The timeline method will get the user ID and display trending movies and recommendations based on similar movies that that user had watched.

def my_timeline(user, user_item, users, movies, movies_table, k=5, N=5):

 get\_trending(user\_item, movies, movies\_table, N=N)

 because\_you\_watched(user, user\_item, users, movies, k=k, N=N)

my_timeline(500, user_item, users, movies, movies_table, k=5, N=5)

Trending Now

“Because you watched “, ‘Definitely, Maybe (2008)’

“Because you watched “, ‘Pocahontas (1995)’

“Because you watched “, ‘Simpsons Movie, The (2007)’

“Because you watched “, ‘Catch Me If You Can (2002)’

“Because you watched “, ‘Risky Business (1983)’

Export trained models to be used in production

At this point, we want to get our model into production. We want to create a web service where a user will provide a userid to the service and the service will return all of the recommendations including the trending and the “because you’ve watched”.

To do that, We first export the trained model and the used data for use in the web service.

import scipy.sparse

scipy.sparse.save_npz(‘model/user_item.npz’, user_item)



np.save(‘model/movies.npy’, movies)np.save(‘model/users.npy’, users)movies_table.to_csv(‘model/movies_table.csv’, index=False)


from sklearn.externals import joblibjoblib.dump(model, ‘model/model.pkl’)

Conclusion

In this post, we recommend movies to users based on their movie rating history. From there, we tried to clone the “because you watched” feature from Netflix and also display Trending movies as movies that were rated the most number of times. In the next post, we will try to put our work in a web service, where a user requests movie recommendations by providing its user ID.

Stay tuned!