paint-brush
Hagarika Kuzunguruka, Tangira Kubaka: Kora ibyawe bwite bya firime ya AIna@superlinked
Amateka mashya

Hagarika Kuzunguruka, Tangira Kubaka: Kora ibyawe bwite bya firime ya AI

na Superlinked11m2025/03/06
Read on Terminal Reader

Birebire cyane; Gusoma

Wige uburyo bwo gukora ibintu, sisitemu ya AI ikoreshwa na sisitemu yo guhanura firime yawe ikurikira hamwe neza. Muriyi nyigisho, tuzakuyobora muburyo bwo gukora sisitemu yo gusaba firime ukoresheje ububiko bwa vector. Uzamenya uburyo moteri igezweho ya AI ikora kandi ubone uburambe bwo kubaka sisitemu yawe hamwe na Superlinked.
featured image - Hagarika Kuzunguruka, Tangira Kubaka: Kora ibyawe bwite bya firime ya AI
Superlinked HackerNoon profile picture
0-item
1-item

Ati: “Navuze ko nshaka dammit ya B-Movie!”

Iherezo ryumuzingo utagira iherezo (hamwe nimpaka hejuru yibyo kureba…)

Kurambirwa kuzenguruka ubuziraherezo muri Netflix, utazi neza icyo uzareba ubutaha? Byagenda bite niba ushobora kwiyubakira wenyine, sisitemu yo kuguha inama ya AI itangaza firime yawe ikurikira kandi neza?


Muriyi nyigisho, tuzakuyobora muburyo bwo gukora sisitemu yo kwerekana firime ukoresheje ububiko bwa vector (VectorDBs) . Uzamenya uburyo moteri igezweho ya AI ikora kandi ubone uburambe bwo kubaka sisitemu yawe hamwe na Superlinked .


(Urashaka gusimbuka neza kuri kode? Reba repo yacu kuri GitHub hano . Witegure kugerageza sisitemu yo gutanga inama kubibazo byawe bwite? Shaka demo hano .)

Reka tubone ibyifuzo!

Tuzakurikira iyi ikaye mu ngingo. Urashobora kandi gukoresha code neza uhereye kuri mushakisha yawe ukoresheje Colab.


Icyifuzo cya Netflix cyerekana algorithm ikora akazi keza ko gutanga ibitekerezo bijyanye - ukurikije ubwinshi bwamahitamo (~ 16k firime na gahunda za TV muri 2023) nuburyo bwihuse bwo gutanga ibitekerezo kubakoresha. Netflix ibikora ite? Mu ijambo, gushakisha ibisobanuro .


Ishakisha rya semantique risobanukirwa ibisobanuro nibisobanuro (byombi nibiranga uburyo bwo gukoresha) inyuma yibibazo byabakoresha hamwe na firime / TV yerekana ibisobanuro, bityo rero birashobora gutanga ubumuntu bwiza mubibazo byifuzo byayo kuruta ibyifuzo byibanze bishingiye kumajambo gakondo.


Ariko gushakisha ibisobanuro bitera ibibazo bimwe na bimwe - icyambere muri byo: 1) kwemeza ibisubizo nyabyo byubushakashatsi, 2) gusobanurwa, na 3) ubunini - imbogamizi ingamba zose zatanzwe neza zigomba gukemura. Ukoresheje isomero rya Superlinked, urashobora gutsinda izo ngorane.


Muri iyi ngingo, tuzakwereka uburyo wakoresha isomero rya superlinked kugirango ushireho ubushakashatsi bwawe bwite kandi utange urutonde rwa firime zijyanye ukurikije ibyo ukunda.

Gushakisha Ibisobanuro - Ibibazo

Ishakisha rya semantique ritanga agaciro kanini mugushakisha kwa vector ariko ritanga ibibazo bitatu byingenzi byerekeranye no gushira ibibazo kubateza imbere:

  • Ubwiza n'akamaro : Kugenzura niba ibyo washyizemo bifata neza ibisobanuro bisobanura amakuru yawe bisaba guhitamo neza tekinike yo gushira, amakuru yo guhugura, hamwe na hyperparameter. Kwinjiza ubuziranenge birashobora kuganisha kubisubizo byubushakashatsi bidahwitse hamwe nibyifuzo bidafite akamaro.


  • Ibisobanuro : Umwanya wo hejuru wa vector umwanya uragoye cyane kubyumva byoroshye. Kugirango ubone ubushishozi mubusabane nubusabane bukubiye muri bo, abahanga mubumenyi bagomba gukora uburyo bwo kubishushanya no kubisesengura.


  • Ubunini : Gucunga no gutunganya ibyashizwe hejuru cyane, cyane cyane mumibare minini, birashobora kunaniza umutungo wo kubara no kongera ubukererwe. Uburyo bunoze bwo kwerekana ibimenyetso, kugarura, hamwe no kubara bisa nibyingenzi kugirango hamenyekane ubunini nigihe gikora mubikorwa bidukikije.


Isomero rya superlinked rigushoboza gukemura ibyo bibazo. Hasi, tuzubaka ibyifuzo (cyane cyane kuri firime), duhereye kumakuru dufite kubyerekeranye na firime runaka, dushyiremo aya makuru nka vectori ya multimodal, twubake indangagaciro zashakishwa kuri firime zacu zose, hanyuma dukoreshe uburemere bwibibazo kugirango duhindure ibisubizo byacu kandi tugere kubitekerezo byiza bya firime. Reka tuyinjiremo.

Gukora Ubushakashatsi Bwihuse kandi bwizewe hamwe na superlinked

Hasi, uzakora ubushakashatsi bwibisobanuro kuri dataset ya Netflix ukoresheje ibintu bikurikira byububiko bwa superlinked:

  • Umwanya wo kwidagadura - gusobanukirwa gushya (ifaranga ningirakamaro) yamakuru yawe, ukamenya firime nshya.
  • Umwanya uhuye - gusobanura ibice bitandukanye bya metadata ufite kubyerekeranye na firime, nkibisobanuro, umutwe, nubwoko.
  • Ikibazo cyibihe biremereye - kureka ugahitamo icyingenzi mumibare yawe mugihe ukoresheje ikibazo, bityo ugahitamo neza udakeneye kongera gushiramo dataset yose, gukora postprocessing, cyangwa gukoresha uburyo bwihariye bwo guhindura ibintu (nukuvuga kugabanya ubukererwe).

Netflix Dataset, nicyo Tuzabikora

Gutsindira neza firime biragoye cyane kuko hariho amahitamo menshi (> imitwe 9000 muri 2023), kandi abakoresha bashaka ibyifuzo kubisabwa, ako kanya. Reka dufate uburyo bushingiye kumakuru kugirango tubone ikintu dushaka kureba. Muri dataset yacu ya firime, tuzi:

  • ibisobanuro
  • injyana
  • Umutwe
  • kurekura_umwaka


Turashobora gushiramo ibyo byinjira, hanyuma tugashyira hamwe indangagaciro ya vector hejuru yibyo dushyiramo, tugakora umwanya dushobora gushakisha mubisobanuro.


Numara kugira indangagaciro ya vector umwanya, tuzakora:

  • ubanza, reba firime, zungurwe nigitekerezo (comedi yumutima ivuye kumutima)
  • ubutaha, hindura ibisubizo, utange akamaro kanini mumikino imwe yinjiza (urugero, uburemere)
  • hanyuma, shakisha mubisobanuro, injyana, numutwe hamwe namagambo atandukanye yo gushakisha kuri buri
  • hanyuma, nyuma yo kubona firime yegeranye ariko idahuye neza, shakisha kandi ukoresheje iyo firime nkibisobanuro

Kwinjiza no Gutegura Dataset

Intambwe yawe yambere nugushiraho isomero no gutumiza ibyangombwa bisabwa.


alt.renderers.enable(“mimetype”) alt.renderers.enable('colab')


 %pip install superlinked==5.3.0 from datetime import timedelta, datetime import altair as alt import os import pandas as pd from superlinked.evaluation.charts.recency_plotter import RecencyPlotter from superlinked.framework.common.dag.context import CONTEXT_COMMON, CONTEXT_COMMON_NOW from superlinked.framework.common.dag.period_time import PeriodTime from superlinked.framework.common.schema.schema import schema from superlinked.framework.common.schema.schema_object import String, Timestamp from superlinked.framework.common.schema.id_schema_object import IdField from superlinked.framework.common.parser.dataframe_parser import DataFrameParser from superlinked.framework.dsl.executor.in_memory.in_memory_executor import ( InMemoryExecutor, InMemoryApp, ) from superlinked.framework.dsl.index.index import Index from superlinked.framework.dsl.query.param import Param from superlinked.framework.dsl.query.query import Query from superlinked.framework.dsl.query.result import Result from superlinked.framework.dsl.source.in_memory_source import InMemorySource from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace from superlinked.framework.dsl.space.recency_space import RecencySpace alt.renderers.enable("mimetype") # NOTE: to render altair plots in colab, change 'mimetype' to 'colab' alt.data_transformers.disable_max_rows() pd.set_option("display.max_colwidth", 190)


Tugomba kandi gutegura dataset - gusobanura igihe gihoraho, gushiraho URL ya data yamakuru, gukora inkoranyamagambo yububiko, gusoma CSV muri pandas DataFrame, gusukura dataframe hamwe namakuru kugirango ishakwe neza, kandi ikore igenzura ryihuse hamwe nubushishozi. (Reba selile 3 na 4 kugirango ubone ibisobanuro birambuye.)


Noneho ko dataset yateguwe, urashobora guhitamo kugarura ukoresheje isomero rya superlinked.

Kubaka Indangantego yo Gushakisha Vector

Isomero rya superlinked ririmo urutonde rwibanze rwubaka dukoresha mukubaka indangagaciro no gucunga kugarura. Urashobora gusoma kubyerekeye inyubako zubaka muburyo burambuye hano .


Icyambere, ugomba gusobanura Schema yawe kugirango ubwire sisitemu kubyerekeye amakuru yawe.

 # accommodate our inputs in a typed schema @schema class MovieSchema: description: String title: String release_timestamp: Timestamp genres: String id: IdField movie = MovieSchema()


Ibikurikira, ukoresha Umwanya kugirango uvuge uburyo ushaka gufata buri gice cyamakuru mugihe ushizemo. Nibihe Umwanya ukoreshwa biterwa na datatype yawe. Buri mwanya wateguwe kugirango ushiremo amakuru kugirango usubize ubuziranenge bushoboka bwibisubizo.


Mubisobanuro byumwanya, turasobanura uburyo inyongeramusaruro zigomba gushyirwamo kugirango tugaragaze isano isobanutse mumibare yacu.


 # textual fields are embedded using a sentence-transformers model description_space = TextSimilaritySpace( text=movie.description, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) title_space = TextSimilaritySpace( text=movie.title, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) genre_space = TextSimilaritySpace( text=movie.genres, model="sentence-transformers/paraphrase-MiniLM-L3-v2" ) # release date are encoded using our recency space # periodtimes aim to reflect notable breaks in our scores recency_space = RecencySpace( timestamp=movie.release_timestamp, period_time_list=[ PeriodTime(timedelta(days=4 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=10 * YEAR_IN_DAYS)), PeriodTime(timedelta(days=40 * YEAR_IN_DAYS)), ], negative_filter=-0.25, ) movie_index = Index(spaces=[description_space, title_space, genre_space, recency_space])


Umaze gushiraho umwanya wawe hanyuma ugashiraho indangagaciro yawe, ukoresha isoko nuwashinzwe ibice byibitabo kugirango ushireho ibibazo byawe. Reba selile 10-13 mu ikaye .


Noneho ko ibibazo byateguwe, reka tujye kumurongo wo gukora no guhitamo kugaruka muguhindura ibiro.

Sobanukirwa na Recency, nuburyo bwo kuyikoresha muri superlinked

Umwanya wo kwidagadura ureka uhindure ibisubizo byikibazo cyawe ukunda gukuramo ibishaje cyangwa bishya biva muri dataset yawe. Dukoresha imyaka 4, 10, na 40 nkibihe byigihe cyacu kugirango dushobore gutanga imyaka hamwe nandi mazina menshi yibanze - reba selile 5 ).


Reba kuruhuka amanota kumyaka 4, 10, na 40. Amazina arengeje imyaka 40 abona amanota negative_filter .

Amanota yo kwisubiramo mugihe runaka

Gusubiramo no Kunonosora Ibisubizo Byishakisha Ukoresheje Ikibazo Cyibihe Bitandukanye

Reka dusobanure imikorere yihuse yo kwerekana ibisubizo byacu mu ikaye.


 def present_result( result: Result, cols_to_keep: list[str] = ["description", "title", "genres", "release_year", "id"], ) -> pd.DataFrame: # parse result to dataframe df: pd.DataFrame = result.to_pandas() # transform timestamp back to release year df["release_year"] = [ datetime.fromtimestamp(timestamp).year for timestamp in df["release_timestamp"] ] return df[cols_to_keep]


Ibibazo byoroshye kandi byateye imbere

Isomero rya superlinked rigufasha gukora ubwoko butandukanye bwibibazo; hano turasobanura bibiri. Byombi mubibazo byubwoko bwibibazo (byoroshye kandi byateye imbere) reka napime umwanya wihariye (ibisobanuro, umutwe, injyana, kandi byanze bikunze) nkurikije ibyo nkunda. Itandukaniro hagati yabo nuko hamwe nikibazo cyoroshye , nashizeho ikibazo kimwe cyanditse hanyuma nkagaragaza ibisubizo bisa mubisobanuro, umutwe, hamwe numwanya wa genre.


Hamwe nikibazo cyateye imbere , mfite byinshi byiza-kugenzura. Niba mbishaka, nshobora kwinjiza ibibazo bitandukanye mubisobanuro, umutwe, hamwe nubwoko. Dore kode y'ibibazo:


 query_text_param = Param("query_text") simple_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, query_text_param) .similar(title_space.text, query_text_param) .similar(genre_space.text, query_text_param) .limit(Param("limit")) ) advanced_query = ( Query( movie_index, weights={ description_space: Param("description_weight"), title_space: Param("title_weight"), genre_space: Param("genre_weight"), recency_space: Param("recency_weight"), }, ) .find(movie) .similar(description_space.text, Param("description_query_text")) .similar(title_space.text, Param("title_query_text")) .similar(genre_space.text, Param("genre_query_text")) .limit(Param("limit")) )


Ikibazo Cyoroshye

Mubibazo byoroshye, nashizeho inyandiko yikibazo kandi ngashyiraho uburemere butandukanye nkurikije akamaro kanjye kuri njye.


 result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=0, limit=TOP_N, ) present_result(result) 


Ibisubizo Byoroshye Ibisubizo 1

Ibisubizo byacu birimo imitwe imwe namaze kubona. Nshobora guhangana nibi nukuremerera uburemere kubogama ibisubizo byanjye kumitwe ya vuba. Ibiro birasanzwe kugirango habeho igiteranyo (nukuvuga, uburemere bwose burahindurwa kuburyo burigihe bigera kuri byose hamwe 1), ntugomba rero guhangayikishwa nuburyo wabishyizeho.


 result: Result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=1, genre_weight=1, recency_weight=3, limit=TOP_N, ) present_result(result) 


Ibisubizo Byoroshye Ibisubizo 1

Ibisubizo byanjye (hejuru) byose ni nyuma ya 2021.


Nkoresheje ikibazo cyoroshye, ndashobora kuremerera umwanya uwariwo wose (ibisobanuro, umutwe, injyana, cyangwa kwisubiraho) kugirango ubare byinshi mugihe ugarutse ibisubizo. Reka tugerageze nibi. Hasi, tuzatanga uburemere bwubwoko hamwe nuburemere bwibiro - inyandiko yanjye yibibazo ahanini ni ubwoko bufite imiterere yinyongera. Nkomeje kwitonda nkuko biri kuko ndacyashaka ko ibisubizo byanjye bibogama kuri firime ziherutse.


 result = app.query( simple_query, query_text="Heartfelt romantic comedy", description_weight=1, title_weight=0.1, genre_weight=2, recency_weight=1, limit=TOP_N, ) present_result(result)


Iki kibazo gisunika umwaka wo gusohora inyuma gato kugirango umpe ibisubizo biremereye byubwoko (munsi).


Ibisubizo Byoroshye Ibisubizo 3

Ikibazo Cyiza

Ikibazo cyateye imbere kirampa ndetse kurushaho kugenzura neza. Ndagumya kugenzura ibyiyumvo, ariko ndashobora kandi kwerekana inyandiko ishakisha ibisobanuro, umutwe, nubwoko, kandi nkagenera buriwese uburemere bwihariye nkurikije ibyo nkunda, munsi (na selile 19-21 ),

 result = app.query( advanced_query, description_query_text="Heartfelt lovely romantic comedy for a cold autumn evening.", title_query_text="love", genre_query_text="drama comedy romantic", description_weight=0.2, title_weight=3, genre_weight=1, recency_weight=5, limit=TOP_N, ) present_result(result)


Shakisha Ukoresheje Filime Yihariye

Vuga mubisubizo bya firime yanyuma, nabonye firime namaze kubona kandi nifuza kubona ibintu bisa. Reka dufate ko nkunda Noheri Yera, urwenya rwurukundo rwo mu 1954 (id = tm16479) kubyerekeye abaririmbyi-ababyinnyi bahurira hamwe kugirango berekane abashyitsi mu icumbi rya Vermont. Mugushyiramo inyongera hamwe with_vector (hamwe na movie_id parameter) kumurongo wambere, hamwe na_movie_query reka nshakishe nkoresheje iyi firime (cyangwa firime iyo ari yo yose nkunda), kandi umpaye kugenzura neza kugenzura inyandiko zitandukanye zishakisha hamwe nuburemere.


Ubwa mbere, twongeyeho firime_id parameter:

 with_movie_query = advanced_query.with_vector(movie, Param("movie_id"))


Hanyuma, nshobora gushiraho ibindi bibazo byanjye byubushakashatsi haba kubusa cyangwa ikindi kintu cyose gifatika, hamwe nuburemere ubwo aribwo bwose. Reka tuvuge ikibazo cyanjye cya mbere gisubiza ibisubizo byerekana imikorere ya stade / bande ya Noheri Yera (reba selile 24 ), ariko ndashaka kureba firime ireba umuryango. Nshobora kwinjiza ibisobanuro_ibibazo_text kugirango mpindure ibisubizo byanjye mubyifuzo.

 result = app.query( with_movie_query, description_query_text="family", title_query_text="", genre_query_text="", description_weight=1, title_weight=0, genre_weight=0, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result) 


Ikibazo Cyambere Ibisubizo 1

Ariko ubu maze kubona ibisubizo byanjye, menye ko mubyukuri ndushijeho kuba mwiza kubintu byoroshye-bisekeje. Reka duhindure ikibazo cyanjye dukurikije:


 Result = app.query( with_movie_query, description_query_text="", title_query_text="", genre_query_text="comedy", description_weight=1, title_weight=0, genre_weight=2, recency_weight=0, description_query_weight=1, movie_id="tm16479", limit=TOP_N, ) present_result(result) 


Ikibazo Cyambere Ibisubizo 2

Nibyo, ibisubizo nibyiza. Nzahitamo kimwe muri ibyo. Shyira popcorn kuri!

Umwanzuro

Birenzeho byoroshye kugerageza, gusubiramo, no kunoza ubuziranenge bwawe. Hejuru, twakunyuze muburyo bwo gukoresha isomero rya superlinked kugirango ukore ubushakashatsi bwimbitse kumwanya wa vector, uburyo Netflix ikora, hanyuma ugarure ibisubizo nyabyo, bijyanye na firime. Twabonye kandi uburyo bwo guhuza neza ibisubizo byacu, guhindura uburemere n'amagambo yo gushakisha kugeza tugeze kubisubizo byiza.


Noneho, gerageza ikaye wenyine, urebe icyo ushobora kugeraho!

Gerageza ubwawe - Shaka Code & Demo!

  • Fata Kode : Reba ishyirwa mubikorwa muri repo yacu ya GitHub hano . . Kureka, kuyihindura, no kuyigira icyawe!


  • . Reba Mubikorwa : Urashaka kubona ibi bikora mubyukuri? Wandike demo yihuse, kandi ushakishe uburyo Superlinked ishobora kurenza ibyifuzo byawe. Fata demo nonaha !


Moteri zibyifuzo zirimo gushiraho uburyo tuvumbura ibirimo. Yaba firime, umuziki, cyangwa ibicuruzwa, gushakisha vector nigihe kizaza - kandi ubu ufite ibikoresho byo kubaka ibyawe.


Umwanditsi: Mór Kapronczay