“Nalobaki nalingi B-Movie dammit!”
Olembi kosala défilement sans fin na Netflix, oyebi te nini okotala sima? Ezali boni soki okokaki kotonga système na yo moko ya toli oyo etambwisami na AI oyo esakolaka filme oyo olingaka mingi na bosikisiki?
Na mateya oyo, tokotambwisa yo na nzela ya kosala système ya recommandé ya film na kosalelaka ba bases de données vectorielles (VectorDBs) . Okoyekola ndenge nini ba moteurs ya recommandé ya AI ya mikolo oyo esalaka mpe okozwa expérience ya maboko ya kotonga système na yo moko na Superlinked .
(Olingi ko sauter mbala moko na code? Tala repo na biso na GitHub awa . Prêt ya komeka ba systèmes recommandé pona cas d'utilisation na yo moko? Zua démonstration awa .)
Tokolanda kaye oyo na lisolo mobimba. Okoki pe kosala code mbala moko na navigateur na yo na nzela ya Colab.
Algorithme ya recommandation ya Netflix esalaka mosala malamu mpenza ya kopesa likanisi ya makambo oyo etali yango - soki totali volume ya ba options (~16k ba films mpe ba émissions ya TV na 2023) mpe ndenge nini noki esengeli ko proposer ba émissions na ba usagers. Netflix esalaka yango ndenge nini? Na liloba moko, boluki na ndimbola .
Boluki ya sémantique esosolaka ndimbola mpe contexte (ezala ba attributs mpe ba modèles ya consommation) sima ya ba queries ya usager mpe ba déscriptions ya film/émission ya TV, mpe na yango ekoki kopesa personnalisation ya malamu na ba queries mpe ba recommandations na yango koleka ba approches traditionnelles oyo esalemi na ba mots clés.
Kasi boluki sémantique ezali kobimisa mikakatano mosusu - ya liboso na kati na yango : 1) kosala ete ba résultats ya boluki ezala ya sikisiki, 2) interprabilité, mpe 3) évolutivité - mikakatano oyo stratégie nionso ya recommandé ya contenus oyo elongi ekosengela kosilisa. Na kosaleláká bibliotɛkɛ ya Superlinked, okoki kolonga mikakatano yango.
Na article oyo, toko lakisa yo ndenge nini okoki kosalela bibliothèque Superlinked pona ko configurer recherche sémantique na yo moko pe ko produire liste ya ba films pertinents selon ba préférences na yo.
Bolukiluki ya sémantique epesaka motuya mingi na boluki ya vecteur kasi ezali kobimisa mikakatano misato ya ntina ya bokɔtisi vecteur mpo na ba développeurs:
Bibliotɛkɛ oyo babengi Superlinked epesaka yo nzela ya kosilisa mikakatano yango. Na nse, tokotonga recommandé ya contenus (specifiquement mpo na ba films), kobanda na ba informations oyo tozali na yango na oyo etali film moko epesami, toko intégrer information oyo lokola vecteur multimodal, tokotonga index vecteur oyo ekoki kolukama mpo na ba films na biso nionso, mpe sima tokosalela ba poids ya requête mpo na ko tweak ba résultats na biso mpe kokoma na ba recommandations ya ba films ya malamu. Tokota na kati na yango.
Na nse, okosala boluki ya sémantique na ensemble ya ba données ya film ya Netflix na kosalelaka ba éléments oyo elandi ya bibliothèque Superlinked:
Ko recommander na succès ba films eza pasi mingi mingi po ba options ezali ebele (>9000 titres na 2023), pe ba usagers balingi ba recommandations sur demande, mbala moko. Tozua approche oyo etambwisami na ba données pona koluka eloko oyo tolingi kotala. Na ensemble na biso ya ba données ya ba films, toyebi ba:
Tokoki ko intégrer ba entrées wana, pe kotia esika moko index vecteur likolo ya ba intégrations na biso, ko créer espace oyo tokoki koluka na ndenge ya sémantique.
Soki tozwi esika na biso ya vecteur indexé, tokosala:
Etape na yo ya liboso ezali ya ko installer bibliothèque pe ko importer ba classes oyo esengeli.
(Liyebisi: Na nse, bongola alt.renderers.enable(“mimetype”)
na alt.renderers.enable('colab')
soki ozali kosala oyo na google colab . Bomba “mimetype” soki ozali kosala na github .)
%pip install superlinked==5.3.0 from datetime import timedelta, datetime import altair as alt import os import pandas as pd from superlinked.evaluation.charts.recency_plotter import RecencyPlotter from superlinked.framework.common.dag.context import CONTEXT_COMMON, CONTEXT_COMMON_NOW from superlinked.framework.common.dag.period_time import PeriodTime from superlinked.framework.common.schema.schema import schema from superlinked.framework.common.schema.schema_object import String, Timestamp from superlinked.framework.common.schema.id_schema_object import IdField from superlinked.framework.common.parser.dataframe_parser import DataFrameParser from superlinked.framework.dsl.executor.in_memory.in_memory_executor import ( InMemoryExecutor, InMemoryApp, ) from superlinked.framework.dsl.index.index import Index from superlinked.framework.dsl.query.param import Param from superlinked.framework.dsl.query.query import Query from superlinked.framework.dsl.query.result import Result from superlinked.framework.dsl.source.in_memory_source import InMemorySource from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace from superlinked.framework.dsl.space.recency_space import RecencySpace alt.renderers.enable("mimetype") # NOTE: to render altair plots in colab, change 'mimetype' to 'colab' alt.data_transformers.disable_max_rows() pd.set_option("display.max_colwidth", 190)
Tosengeli mpe kobongisa ensemble ya ba données - kolimbola ba constantes ya temps, kotiya esika ya URL ya ba données, kosala dictionnaire ya magasin ya ba données, kotanga CSV na kati ya pandas DataFrame, kosukola cadre ya ba données mpe ba données mpo ete ekoki kolukama malamu, mpe kosala vérification mpe botali ya mbangu. (Talá baselile 3 mpe 4 mpo na koyeba makambo mosusu.)
Sikoyo lokola ensemble ya ba données ebongisami, okoki ko optimiser récupération na yo na nzela ya bibliothèque Superlinked.
Bibliothèque ya Superlinked ezali na ensemble ya ba blocs de construction ya moboko oyo tosalelaka pona kotonga index pe ko gérer récupération. Okoki kotanga na ntina ya biloko wana ya kotonga na bozindo awa .
Ya liboso, esengeli o définir Schéma na yo pona koyebisa système ba données na yo.
# accommodate our inputs in a typed schema @schema class MovieSchema: description: String title: String release_timestamp: Timestamp genres: String id: IdField movie = MovieSchema()
Na sima, osalelaka ba Espaces pona koloba ndenge nini olingi ko traité eteni moko na moko ya ba données tango ozali ko intégrer. Ba Espaces nini esalelamaka etali type ya ba données na yo. Espace moko na moko ezali optimisé mpo na ko intégrer ba données mpo na kozongisa qualité ya likolo ya ba résultats ya récupération.
Na ba définitions ya Espace, tozali kolimbola ndenge nini esengeli ko intégrer ba entrées pona ko refleter ba relation sémantique na ba données na biso.
# textual fields are embedded using a sentence-transformers model description_space = TextSimilaritySpace( text=movie.description, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) title_space = TextSimilaritySpace( text=movie.title, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) genre_space = TextSimilaritySpace( text=movie.genres, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) # release date are encoded using our recency space # periodtimes aim to reflect notable breaks in our scores recency_space = RecencySpace( timestamp=movie.release_timestamp, period_time_list=[ PeriodTime(timedelta(days=4 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=10 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=40 * YEAR_IN_DAYS)), ], negative_filter=-0.25, ) movie_index = Index(spaces=[description_space, title_space, genre_space, recency_space])
Soki obongisi bisika na yo mpe osali index na yo, osalelaka biteni ya source mpe ya exécuteur ya bibliothèque mpo na kobongisa ba requêtes na yo. Talá baselile 10-13 na kaye .
Sikoyo lokola ba requêtes ebongisami, tokende na kosala ba requêtes mpe ko optimiser récupération na ko ajuster ba poids.
Espace ya recency e permettre yo o changer ba résultats ya query na yo na préférentiellement ko benda ba sorties ya kala to ya sika na ensemble ya ba données na yo. Tosalelaka mibu 4, 10, mpe 40 lokola bantango na biso ya période mpo ete tokoka kopesa mibu na ba titres mingi focus mingi - tala cellule 5 ).
Talá ba pauses na score na 4, 10, mpe 40 ans. Ba titres oyo eleki mbula 40 ezuaka score negative_filter
.
To définir fonction util ya mbangu pona ko présenter ba résultats na biso na cahier.
def present_result( result: Result, cols_to_keep: list[str] = ["description", "title", "genres", "release_year", "id"], ) -> pd.DataFrame: # parse result to dataframe df: pd.DataFrame = result.to_pandas() # transform timestamp back to release year df["release_year"] = [ datetime.fromtimestamp(timestamp).year for timestamp in df["release_timestamp"] ] return df[cols_to_keep]
Bibliothèque Superlinked epesaka yo nzela ya kosala ba requêtes ya ndenge na ndenge; awa tozali kolimbola mibale. Mitindo na biso mibale ya mituna ya mituna (pete mpe ya liboso) tika ngai napesa kilo na bisika ya moto na moto (ndimbola, motó ya likambo, genre, mpe ya solo recency) engebene na ba préférences na ngai. Bokeseni kati na bango ezali ete na motuna moko ya pɛtɛɛ , natye makomi moko ya motuna mpe na nsima nabimisaka matomba ya ndenge moko na bisika ya kolimbola, motó ya likambo, mpe ya genre.
With an advanced query , nazali na contrôle ya grain fine mingi. Soki nalingi, nakoki kokotisa makomi ya mituna ekeseni na moko na moko ya bisika ya ndimbola, motó ya likambo, mpe ya genre. Tala code ya requête:
query_text_param = Param("query_text") simple_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, query_text_param) .similar(title_space.text, query_text_param) .similar(genre_space.text, query_text_param) .limit(Param("limit")) ) advanced_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, Param("description_query_text")) .similar(title_space.text, Param("title_query_text")) .similar(genre_space.text, Param("genre_query_text")) .limit(Param("limit")) )
Na ba requêtes simples, na tia texte ya requête na ngai pe na appliquer ba poids différents selon importance na yango pona ngai.
result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=0, limit=TOP_N, ) present_result(result)
Ba résultats na biso ezali na ba titres oyo namoni déjà. Nakoki kosala na likambo oyo na kopesa kilo na recency mpo na ko bias ba résultats na ngai vers ba titres récents. Ba poids e normaliser po ezala na somme unitaire (c.a.d., ba poids nionso e ajusté donc e sommer toujours na total ya 1), donc il faut omitungisa te na ndenge okotia yango.
result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=3, limit=TOP_N, ) present_result(result)
Ba résultats na ngai (likolo) ezali sikoyo nionso post-2021.
Kosalela motuna ya pete, nakoki kopesa kilo na esika nyonso ya sikisiki (ndimbola, motó ya likambo, genre, to ya sika) mpo na kosala ete etangama mingi ntango nazali kozongisa ba résultats. Tomeka likambo oyo. En bas, tokopesa poids mingi na genre na titre ya poids ya se - texte ya requête na ngai ezali essentiellement kaka genre na mua contexte supplémentaire. Na bomba recency na ngai ndenge ezali mpo nakolinga kaka ba résultats na ngai ezala bias na ba films ya sika.
result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=0.1, genre_weight=2, recency_weight=1, limit=TOP_N, ) present_result(result)
Query oyo ezo puser année ya sortie mua sima pona kopesa ngai ba résultats pondérés ya genre mingi (na se).
Requête avancée epesaka ngai contrôle encore plus fines. Nabatelaka bokonzi likoló na makambo oyo euti kosalema kala mingi te, kasi nakoki mpe koyebisa makomi ya boluki mpo na ndimbola, motó ya likambo, mpe lolenge ya lolenge, mpe kopesa mokomoko na mokomoko kilo ya sikisiki engebene oyo nalingaka, na kotalela na nse (mpe baselile 19-21 ), .
result = app.query( advanced_query, description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.", title_query_text="love", genre_query_text="drama comedy romantic", description_weight=0.2, title_weight=3, genre_weight=1, recency_weight=5, limit=TOP_N, ) present_result(result)
Loba na ba résultats ya film na ngai ya suka, nakuti film moko nasi na mona pe nakolinga komona eloko ya ndenge wana. Tokanisa ete nalingaka Noele ya Pembe, comédie romantique ya 1954 (id = tm16479) oyo elobeli bayembi-babini koya esika moko mpo na elakiseli ya estrade mpo na kobenda bapaya na ndako ya bapaya ya Vermont oyo ezali kobunda. Na kobakisa clause with_vector
ya likolo (na paramètre movie_id
) na advanced_query, with_movie_query e permettre ngai naluka na nzela ya film oyo (to film nionso nalingaka), pe epesi ngai contrôle nionso ya grain fine ya texte ya requête ya sous-recherche separate na poids.
Ya liboso, tobakisi paramètre na biso ya movie_id:
with_movie_query = advanced_query.with_vector(movie, Param("movie_id"))
Mpe na nsima nakoki kotya mituna na ngai mosusu ya boluki ya moke soit na mpamba to oyo ezali na ntina mingi, elongo na ba poids nionso oyo ezali na ntina. Toloba query na ngai ya liboso ezongisaka ba résultats oyo ezo lakisa aspect ya performance/band ya scène ya Noël Blanc (tala cellule 24 ), kasi nalingi kotala film oyo ezali plus orienté na famille. Nakoki kokotisa description_query_text mpo na ko skew ba résultats na ngai na direction oyo nalingi.
result = app.query( with_movie_query, description_query_text="family", title_query_text="", genre_query_text="", description_weight=1, title_weight=0, genre_weight=0, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result)
Kasi sikawa lokola namoni mbano na ngai, nasosoli ete nazali mpenza mingi na ezalela ya likambo moko ya motema pete mpe ya kosekisa. Tobongisa motuna na ngai na kolanda yango:
Result = app.query( with_movie_query, description_query_text="", title_query_text="", genre_query_text="comedy", description_weight=1, title_weight=0, genre_weight=2, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result)
Okey, ba résultats wana eza malamu koleka. Nakopona moko ya makambo oyo. Tia ba popcorn na likolo!
Superlinked esalaka ete ezala pete mpo na komeka, kozongela, mpe kobongisa lolenge na yo ya kozwa. Likolo, totambolisi bino na ndenge ya kosalela bibliothèque Superlinked mpo na kosala boluki ya sémantique na esika ya vecteur, lolenge Netflix esalaka, mpe kozongisa ba résultats ya film ya sikisiki, oyo etali yango. Tomoni mpe ndenge ya kobongisa malamu ba résultats na biso, ko tweaking ba poids mpe ba termes ya boluki tii tokokoma kaka na résultat oyo ebongi.
Sikoyo, meká yo moko kaye yango , mpe talá nini okoki kokokisa!
Ba moteurs ya recommandation ezali ko shape ndenge to découvrir contenus. Ezala ba films, miziki, to biloko, boluki ya vecteur ezali avenir —mpe sikoyo ozali na bisaleli ya kotonga ya yo moko.
Mokomi: Mór Kapronczay