Ati: “Navuze ko nshaka dammit ya B-Movie!”
Kurambirwa kuzenguruka ubuziraherezo muri Netflix, utazi neza icyo uzareba ubutaha? Byagenda bite niba ushobora kwiyubakira wenyine, sisitemu yo kuguha inama ya AI itangaza firime yawe ikurikira kandi neza?
Muriyi nyigisho, tuzakuyobora muburyo bwo gukora sisitemu yo kwerekana firime ukoresheje ububiko bwa vector (VectorDBs) . Uzamenya uburyo moteri igezweho ya AI ikora kandi ubone uburambe bwo kubaka sisitemu yawe hamwe na Superlinked .
(Urashaka gusimbuka neza kuri kode? Reba repo yacu kuri GitHub hano . Witegure kugerageza sisitemu yo gutanga inama kubibazo byawe bwite? Shaka demo hano .)
Tuzakurikira iyi ikaye mu ngingo. Urashobora kandi gukoresha code neza uhereye kuri mushakisha yawe ukoresheje Colab.
Icyifuzo cya Netflix cyerekana algorithm ikora akazi keza ko gutanga ibitekerezo bijyanye - ukurikije ubwinshi bwamahitamo (~ 16k firime na gahunda za TV muri 2023) nuburyo bwihuse bwo gutanga ibitekerezo kubakoresha. Netflix ibikora ite? Mu ijambo, gushakisha ibisobanuro .
Ishakisha rya semantique risobanukirwa ibisobanuro nibisobanuro (byombi nibiranga uburyo bwo gukoresha) inyuma yibibazo byabakoresha hamwe na firime / TV yerekana ibisobanuro, bityo rero birashobora gutanga ubumuntu bwiza mubibazo byifuzo byayo kuruta ibyifuzo byibanze bishingiye kumajambo gakondo.
Ariko gushakisha ibisobanuro bitera ibibazo bimwe na bimwe - icyambere muri byo: 1) kwemeza ibisubizo nyabyo byubushakashatsi, 2) gusobanurwa, na 3) ubunini - imbogamizi ingamba zose zatanzwe neza zigomba gukemura. Ukoresheje isomero rya Superlinked, urashobora gutsinda izo ngorane.
Muri iyi ngingo, tuzakwereka uburyo wakoresha isomero rya superlinked kugirango ushireho ubushakashatsi bwawe bwite kandi utange urutonde rwa firime zijyanye ukurikije ibyo ukunda.
Ishakisha rya semantique ritanga agaciro kanini mugushakisha kwa vector ariko ritanga ibibazo bitatu byingenzi byerekeranye no gushira ibibazo kubateza imbere:
Isomero rya superlinked rigushoboza gukemura ibyo bibazo. Hasi, tuzubaka ibyifuzo (cyane cyane kuri firime), duhereye kumakuru dufite kubyerekeranye na firime runaka, dushyiremo aya makuru nka vectori ya multimodal, twubake indangagaciro zashakishwa kuri firime zacu zose, hanyuma dukoreshe uburemere bwibibazo kugirango duhindure ibisubizo byacu kandi tugere kubitekerezo byiza bya firime. Reka tuyinjiremo.
Hasi, uzakora ubushakashatsi bwibisobanuro kuri dataset ya Netflix ukoresheje ibintu bikurikira byububiko bwa superlinked:
Gutsindira neza firime biragoye cyane kuko hariho amahitamo menshi (> imitwe 9000 muri 2023), kandi abakoresha bashaka ibyifuzo kubisabwa, ako kanya. Reka dufate uburyo bushingiye kumakuru kugirango tubone ikintu dushaka kureba. Muri dataset yacu ya firime, tuzi:
Turashobora gushiramo ibyo byinjira, hanyuma tugashyira hamwe indangagaciro ya vector hejuru yibyo dushyiramo, tugakora umwanya dushobora gushakisha mubisobanuro.
Numara kugira indangagaciro ya vector umwanya, tuzakora:
Intambwe yawe yambere nugushiraho isomero no gutumiza ibyangombwa bisabwa.
alt.renderers.enable(“mimetype”)
alt.renderers.enable('colab')
%pip install superlinked==5.3.0 from datetime import timedelta, datetime import altair as alt import os import pandas as pd from superlinked.evaluation.charts.recency_plotter import RecencyPlotter from superlinked.framework.common.dag.context import CONTEXT_COMMON, CONTEXT_COMMON_NOW from superlinked.framework.common.dag.period_time import PeriodTime from superlinked.framework.common.schema.schema import schema from superlinked.framework.common.schema.schema_object import String, Timestamp from superlinked.framework.common.schema.id_schema_object import IdField from superlinked.framework.common.parser.dataframe_parser import DataFrameParser from superlinked.framework.dsl.executor.in_memory.in_memory_executor import ( InMemoryExecutor, InMemoryApp, ) from superlinked.framework.dsl.index.index import Index from superlinked.framework.dsl.query.param import Param from superlinked.framework.dsl.query.query import Query from superlinked.framework.dsl.query.result import Result from superlinked.framework.dsl.source.in_memory_source import InMemorySource from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace from superlinked.framework.dsl.space.recency_space import RecencySpace alt.renderers.enable("mimetype") # NOTE: to render altair plots in colab, change 'mimetype' to 'colab' alt.data_transformers.disable_max_rows() pd.set_option("display.max_colwidth", 190)
Tugomba kandi gutegura dataset - gusobanura igihe gihoraho, gushiraho URL ya data yamakuru, gukora inkoranyamagambo yububiko, gusoma CSV muri pandas DataFrame, gusukura dataframe hamwe namakuru kugirango ishakwe neza, kandi ikore igenzura ryihuse hamwe nubushishozi. (Reba selile 3 na 4 kugirango ubone ibisobanuro birambuye.)
Noneho ko dataset yateguwe, urashobora guhitamo kugarura ukoresheje isomero rya superlinked.
Isomero rya superlinked ririmo urutonde rwibanze rwubaka dukoresha mukubaka indangagaciro no gucunga kugarura. Urashobora gusoma kubyerekeye inyubako zubaka muburyo burambuye hano .
Icyambere, ugomba gusobanura Schema yawe kugirango ubwire sisitemu kubyerekeye amakuru yawe.
# accommodate our inputs in a typed schema @schema class MovieSchema: description: String title: String release_timestamp: Timestamp genres: String id: IdField movie = MovieSchema()
Ibikurikira, ukoresha Umwanya kugirango uvuge uburyo ushaka gufata buri gice cyamakuru mugihe ushizemo. Nibihe Umwanya ukoreshwa biterwa na datatype yawe. Buri mwanya wateguwe kugirango ushiremo amakuru kugirango usubize ubuziranenge bushoboka bwibisubizo.
Mubisobanuro byumwanya, turasobanura uburyo inyongeramusaruro zigomba gushyirwamo kugirango tugaragaze isano isobanutse mumibare yacu.
# textual fields are embedded using a sentence-transformers model description_space = TextSimilaritySpace( text=movie.description, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) title_space = TextSimilaritySpace( text=movie.title, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) genre_space = TextSimilaritySpace( text=movie.genres, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) # release date are encoded using our recency space # periodtimes aim to reflect notable breaks in our scores recency_space = RecencySpace( timestamp=movie.release_timestamp, period_time_list=[ PeriodTime(timedelta(days=4 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=10 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=40 * YEAR_IN_DAYS)), ], negative_filter=-0.25, ) movie_index = Index(spaces=[description_space, title_space, genre_space, recency_space])
Umaze gushiraho umwanya wawe hanyuma ugashiraho indangagaciro yawe, ukoresha isoko nuwashinzwe ibice byibitabo kugirango ushireho ibibazo byawe. Reba selile 10-13 mu ikaye .
Noneho ko ibibazo byateguwe, reka tujye kumurongo wo gukora no guhitamo kugaruka muguhindura ibiro.
Umwanya wo kwidagadura ureka uhindure ibisubizo byikibazo cyawe ukunda gukuramo ibishaje cyangwa bishya biva muri dataset yawe. Dukoresha imyaka 4, 10, na 40 nkibihe byigihe cyacu kugirango dushobore gutanga imyaka hamwe nandi mazina menshi yibanze - reba selile 5 ).
Reba kuruhuka amanota kumyaka 4, 10, na 40. Amazina arengeje imyaka 40 abona amanota negative_filter
.
Reka dusobanure imikorere yihuse yo kwerekana ibisubizo byacu mu ikaye.
def present_result( result: Result, cols_to_keep: list[str] = ["description", "title", "genres", "release_year", "id"], ) -> pd.DataFrame: # parse result to dataframe df: pd.DataFrame = result.to_pandas() # transform timestamp back to release year df["release_year"] = [ datetime.fromtimestamp(timestamp).year for timestamp in df["release_timestamp"] ] return df[cols_to_keep]
Isomero rya superlinked rigufasha gukora ubwoko butandukanye bwibibazo; hano turasobanura bibiri. Byombi mubibazo byubwoko bwibibazo (byoroshye kandi byateye imbere) reka napime umwanya wihariye (ibisobanuro, umutwe, injyana, kandi byanze bikunze) nkurikije ibyo nkunda. Itandukaniro hagati yabo nuko hamwe nikibazo cyoroshye , nashizeho ikibazo kimwe cyanditse hanyuma nkagaragaza ibisubizo bisa mubisobanuro, umutwe, hamwe numwanya wa genre.
Hamwe nikibazo cyateye imbere , mfite byinshi byiza-kugenzura. Niba mbishaka, nshobora kwinjiza ibibazo bitandukanye mubisobanuro, umutwe, hamwe nubwoko. Dore kode y'ibibazo:
query_text_param = Param("query_text") simple_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, query_text_param) .similar(title_space.text, query_text_param) .similar(genre_space.text, query_text_param) .limit(Param("limit")) ) advanced_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, Param("description_query_text")) .similar(title_space.text, Param("title_query_text")) .similar(genre_space.text, Param("genre_query_text")) .limit(Param("limit")) )
Mubibazo byoroshye, nashizeho inyandiko yikibazo kandi ngashyiraho uburemere butandukanye nkurikije akamaro kanjye kuri njye.
result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=0, limit=TOP_N, ) present_result(result)
Ibisubizo byacu birimo imitwe imwe namaze kubona. Nshobora guhangana nibi nukuremerera uburemere kubogama ibisubizo byanjye kumitwe ya vuba. Ibiro birasanzwe kugirango habeho igiteranyo (nukuvuga, uburemere bwose burahindurwa kuburyo burigihe bigera kuri byose hamwe 1), ntugomba rero guhangayikishwa nuburyo wabishyizeho.
result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=3, limit=TOP_N, ) present_result(result)
Ibisubizo byanjye (hejuru) byose ni nyuma ya 2021.
Nkoresheje ikibazo cyoroshye, ndashobora kuremerera umwanya uwariwo wose (ibisobanuro, umutwe, injyana, cyangwa kwisubiraho) kugirango ubare byinshi mugihe ugarutse ibisubizo. Reka tugerageze nibi. Hasi, tuzatanga uburemere bwubwoko hamwe nuburemere bwibiro - inyandiko yanjye yibibazo ahanini ni ubwoko bufite imiterere yinyongera. Nkomeje kwitonda nkuko biri kuko ndacyashaka ko ibisubizo byanjye bibogama kuri firime ziherutse.
result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=0.1, genre_weight=2, recency_weight=1, limit=TOP_N, ) present_result(result)
Iki kibazo gisunika umwaka wo gusohora inyuma gato kugirango umpe ibisubizo biremereye byubwoko (munsi).
Ikibazo cyateye imbere kirampa ndetse kurushaho kugenzura neza. Ndagumya kugenzura ibyiyumvo, ariko ndashobora kandi kwerekana inyandiko ishakisha ibisobanuro, umutwe, nubwoko, kandi nkagenera buriwese uburemere bwihariye nkurikije ibyo nkunda, munsi (na selile 19-21 ),
result = app.query( advanced_query, description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.", title_query_text="love", genre_query_text="drama comedy romantic", description_weight=0.2, title_weight=3, genre_weight=1, recency_weight=5, limit=TOP_N, ) present_result(result)
Vuga mubisubizo bya firime yanyuma, nabonye firime namaze kubona kandi nifuza kubona ibintu bisa. Reka dufate ko nkunda Noheri Yera, urwenya rwurukundo rwo mu 1954 (id = tm16479) kubyerekeye abaririmbyi-ababyinnyi bahurira hamwe kugirango berekane abashyitsi mu icumbi rya Vermont. Mugushyiramo inyongera hamwe with_vector
(hamwe na movie_id
parameter) kumurongo wambere, hamwe na_movie_query reka nshakishe nkoresheje iyi firime (cyangwa firime iyo ari yo yose nkunda), kandi umpaye kugenzura neza kugenzura inyandiko zitandukanye zishakisha hamwe nuburemere.
Ubwa mbere, twongeyeho firime_id parameter:
with_movie_query = advanced_query.with_vector(movie, Param("movie_id"))
Hanyuma, nshobora gushiraho ibindi bibazo byanjye byubushakashatsi haba kubusa cyangwa ikindi kintu cyose gifatika, hamwe nuburemere ubwo aribwo bwose. Reka tuvuge ikibazo cyanjye cya mbere gisubiza ibisubizo byerekana imikorere ya stade / bande ya Noheri Yera (reba selile 24 ), ariko ndashaka kureba firime ireba umuryango. Nshobora kwinjiza ibisobanuro_ibibazo_text kugirango mpindure ibisubizo byanjye mubyifuzo.
result = app.query( with_movie_query, description_query_text="family", title_query_text="", genre_query_text="", description_weight=1, title_weight=0, genre_weight=0, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result)
Ariko ubu maze kubona ibisubizo byanjye, menye ko mubyukuri ndushijeho kuba mwiza kubintu byoroshye-bisekeje. Reka duhindure ikibazo cyanjye dukurikije:
Result = app.query( with_movie_query, description_query_text="", title_query_text="", genre_query_text="comedy", description_weight=1, title_weight=0, genre_weight=2, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result)
Nibyo, ibisubizo nibyiza. Nzahitamo kimwe muri ibyo. Shyira popcorn kuri!
Birenzeho byoroshye kugerageza, gusubiramo, no kunoza ubuziranenge bwawe. Hejuru, twakunyuze muburyo bwo gukoresha isomero rya superlinked kugirango ukore ubushakashatsi bwimbitse kumwanya wa vector, uburyo Netflix ikora, hanyuma ugarure ibisubizo nyabyo, bijyanye na firime. Twabonye kandi uburyo bwo guhuza neza ibisubizo byacu, guhindura uburemere n'amagambo yo gushakisha kugeza tugeze kubisubizo byiza.
Noneho, gerageza ikaye wenyine, urebe icyo ushobora kugeraho!
Moteri zibyifuzo zirimo gushiraho uburyo tuvumbura ibirimo. Yaba firime, umuziki, cyangwa ibicuruzwa, gushakisha vector nigihe kizaza - kandi ubu ufite ibikoresho byo kubaka ibyawe.
Umwanditsi: Mór Kapronczay