Abanditsi:
(1) Hanoona Rasheed, Mohamed bin Zayed University of AI kandi atanga umusanzu umwe abanditsi ba mbere;
(2) Muhammad Maaz, Mohamed bin Zayed University of AI kandi atanga umusanzu umwe abanditsi ba mbere;
(3) Sahal Shaji, Mohamed bin Zayed University ya AI;
(4) Abdelrahman Shaker, Mohamed bin Zayed University ya AI;
(5) Salman Khan, Mohamed bin Zayed University ya AI na kaminuza nkuru ya Ositarariya;
(6) Hisham Cholakkal, Mohamed bin Zayed University ya AI;
(7) Rao M. Anwer, Mohamed bin Zayed University ya AI na kaminuza ya Aalto;
(8) Eric Xing, Mohamed bin Zayed University ya AI na Carnegie Mellon University;
(9) Ming-Hsuan Yang, Kaminuza ya Californiya - Merced na Google Ubushakashatsi;
(10) Fahad S. Khan, Mohamed bin Zayed University ya AI na kaminuza ya Linköping.
Icyitonderwa cya Muhinduzi: Iki nigice cya 1 kuri 10 cyubushakashatsi burambuye iterambere rya moderi ya AI yagenewe gusobanura amashusho kubakoresha. Soma ibisigaye hepfo.
Ibikoresho by'inyongera (Igice cya 1)
Ibikoresho by'inyongera (Igice cya 2)
Moderi nini ya Multimodal (LMMs) yagura Ururimi runini Ururimi rwerekezo. Intangiriro LMMs yakoresheje amashusho yuzuye hamwe nibisobanuro byerekana kubyara ibisubizo bidafite ishingiro. Vuba aha, urwego rwa LMM rwakoreshejwe mugutanga ibisubizo bifatika. Nyamara, bagarukira gusa ku kuvuga icyiciro kimwe icyarimwe, bisaba abakoresha kwerekana uturere, cyangwa ntibashobora gutanga pigiseli yuzuye yibintu bifatika. Muri iki gikorwa, turerekana Grounding LMM (GLaMM), icyitegererezo cyambere gishobora gutanga ibisubizo byururimi kavukire ntaho bihuriye hamwe na masike yo gutandukanya ibintu. GLaMM ntabwo ishingiye gusa kubintu bigaragara mubiganiro ahubwo biroroshye guhinduka kugirango byemere byombi byanditse kandi bidahinduka (akarere k'inyungu) nkibitekerezo. Ibi biha imbaraga abakoresha gusabana nicyitegererezo mubyiciro bitandukanye bya granularité, haba mubyanditswe ndetse no mumashusho. Bitewe no kubura ibipimo ngenderwaho byerekana uburyo bushya bwo gushiraho Ibiganiro Byerekanwe (GCG), twatangije protocole yuzuye yo gusuzuma hamwe n'ibiganiro byacu bifatika. Igikorwa cyacu cya GCG gisaba ibitekerezo bifatika muburyo busanzwe muburyo bunini. Kugira ngo ibyo bishoboke, turasaba Grounding-ikintu cyose Dataset (GranD) dukoresheje umuyoboro wateganijwe wo gutangaza amakuru akubiyemo 7.5M idasanzwe ishingiye ku turere 810M twose tuboneka hamwe na masike yo gutandukanya. Usibye GCG, GLaMM nayo ikora neza kumirimo myinshi yo hepfo, urugero, kuvuga imvugo igabana, ishusho hamwe nakarere kanditseho ibisobanuro hamwe nibiganiro byururimi.
Bitewe n’umuvuduko ukabije wa AI, Moderi nini ya Multimodal (LMMs) yagaragaye nkiterambere ryingenzi, ikuraho itandukaniro riri hagati yicyerekezo nimirimo yindimi [2]. Imbaraga zambere nka [6, 8, 22, 29, 52, 61] zerekana ibisubizo byiza byanditse bishingiye kumashusho yinjiye. Nubwo izo moderi zifite ubuhanga, ntizishobora guhagarika ibisubizo byazo muburyo bugaragara. Uku gushingiraho ningirakamaro kubikorwa byiterambere nkibisobanuro birambuye byo gusobanukirwa, guhuza ibikorwa, hamwe no gukoresha ibintu. Imbaraga ziheruka zatangiye gukemura iyi mbogamizi zifasha moderi gutunganya uturere twasobanuwe nabakoresha byerekanwe kumasanduku [5, 31, 35, 36, 57].
Ibikorwa bike biherutse gukora ubushakashatsi bwibanze ku bisubizo byatanzwe [5, 21, 35, 59] ariko ntibitanga ibisobanuro birambuye bya pigiseli-urwego. Mu buryo buhuye n’ibi, hashyizweho ingufu mu bitabo byerekeranye no gutandukanya ibyanditswe ku bisobanuro byanditse ku mashusho karemano [21]. Nyamara, bagarukira gusa ku gushingira ku kintu kimwe kandi ntibashobora kwishora mu biganiro bisanzwe, bihuza, bityo bikabuza gukoreshwa mubikorwa byimikorere bisaba gusobanukirwa byimbitse kubiri mumashusho no mubyanditswe. Kugira ngo dukemure izo mbogamizi zimirimo isanzweho, turamenyekanisha Grounding LMM (GLaMM), icyarimwe itanga icyerekezo cyimbitse cyakarere, gusobanukirwa urwego rwa pigiseli, hamwe nubushobozi bwo kuganira binyuze muburyo bwo guhugura kurangiza (reba Ishusho 1 na Tab. 1).
Kugira ngo dukemure ikibazo cyo kubura ibipimo ngenderwaho mu biganiro bifatika, tumenyekanisha umurimo mushya wa Grounded Conversation Generation (GCG). Igikorwa cya GCG kigamije gutanga ibisubizo byururimi karemano bifitanye isano na masike yo gutandukanya ibintu. Iki gikorwa kitoroshye gihuza imirimo myinshi iriho mubyerekezo bya mudasobwa bisanzwe bivurwa mu bwigunge, ni ukuvuga, kwerekana imvugo igabana, ishusho hamwe n'uturere twanditseho urwego, interuro ishingiye, hamwe n'ibiganiro-byerekanwa-ururimi. Kubwibyo, icyitegererezo cyacu gihuriweho hamwe nigitekerezo cyo kwitegura dataset gishobora kwimurwa neza mubikorwa byinshi byo hasi (bivuga igice cyerekana imvugo, urwego rwakarere rwanditseho, amashusho yerekana, hamwe nuburyo bwo kuganira QA). Turerekana GLaMM nkicyitegererezo cyambere cyateguwe kubwiki gikorwa kitoroshye. Bitandukanye nibikorwa byabanje, GLaMM irashobora gukorana nibisobanuro byanditse kandi byerekana amashusho kandi birashobora kubyara umusaruro ushimishije, bityo bigatanga uburambe bwabakoresha.
Ibisobanuro birambuye kurwego rwakarere bisaba inzira igoye yo gukusanya ibisobanuro binini byerekana uturere tw’amashusho. Turasaba umuyoboro wikora kugirango utangaze ibisobanuro binini bya Grounding-ikintu cyose Dataset (GranD) kugirango tugabanye imbaraga zo gushyiramo intoki. Gukoresha umuyoboro wikora hamwe nintambwe zabugenewe zo kugenzura, GranD igizwe na 7.5M idasanzwe idasanzwe yometse mu turere 810M, buri kimwe gifite maskike. Ukoresheje icyerekezo-cyerekezo-cyerekezo nicyitegererezo cyururimi, dataset isobanura amashusho ya SAM [18] binyuze murwego rwinzego zinyuranye gahunda zizamura ubwiza bwa annotation. Hamwe n'amashusho 11M, 84M yerekeza imvugo, hamwe na 33M yanditseho ibisobanuro, GranD ishyiraho igipimo gishya muburyo bwuzuye. Usibye imibare yakozwe mu buryo bwikora kuri GCG, turatanga imibare yambere yambere yujuje ubuziranenge kubiganiro byibanze byabonetse muguhindura imibare isanzwe yandikishijwe intoki [16, 37, 49] kuri GCG dukoresheje GPT-4 [34] imyigire idahwitse. Twerekeza kuri dataset yo mu rwego rwo hejuru nka GranDf, yerekana ko ikwiranye neza.
Ibikorwa byacu bifite imisanzu itatu y'ingenzi:
• Turerekana GLaMM, icyitegererezo cyambere gishobora gutanga ibisubizo byururimi karemano byahujwe na masike yo gutandukanya ibintu. Bitandukanye na moderi zihari, GLaMM yakira inyandiko n'amashusho, byorohereza imikoreshereze yimikoreshereze yimikoreshereze myinshi.
• Tumaze kubona ko nta bipimo ngenderwaho bisanzwe biganirwaho ku biganiro bifatika, turasaba umurimo mushya wo kuganira (GCG). Turashiraho kandi protocole yuzuye yo gusuzuma kugirango tumenye imikorere yicyitegererezo cya GCG ihuza imirimo myinshi yihariye, yuzuza icyuho gikomeye mubitabo.
• Kugirango tworohereze amahugurwa nicyitegererezo, dukora Grounding-ikintu cyose Dataset (GranD), nini-nini ya dataset yuzuye. Yatejwe imbere ikoresheje umuyoboro wo gutangaza amakuru no kugenzura ibipimo, ikubiyemo ibitekerezo 7.5M byihariye bishingiye mu turere 810M. Twongeyeho, turasaba GranDf, dataset yo mu rwego rwohejuru yateguwe neza kubikorwa bya GCG kurangiza, mugusubiramo imibare isanzwe ifunguye.
Uru rupapuro ruraboneka kuri arxiv munsi ya CC BY 4.0 DEED.