paint-brush
Cilmi-baarayaasha Google-ka ayaa soo saaray Farsamo cusub oo AI ah oo aan ku lumin awoodda maskaxda ereyada aan faa'iido doonka ahaynby@textmodels
Taariikh cusub

Cilmi-baarayaasha Google-ka ayaa soo saaray Farsamo cusub oo AI ah oo aan ku lumin awoodda maskaxda ereyada aan faa'iido doonka ahayn

Aad u dheer; In la akhriyo

Habka ugu wanagsan ee loo qoondeeyo agabka xisaabinta ee Transformers AI ayaa ka dhigaysa mid dhakhso badan oo hufan.
featured image - Cilmi-baarayaasha Google-ka ayaa soo saaray Farsamo cusub oo AI ah oo aan ku lumin awoodda maskaxda ereyada aan faa'iido doonka ahayn
Writings, Papers and Blogs on Text Models HackerNoon profile picture
0-item

Qorayaasha:

(1) David Raposo, Google DeepMind iyo wax ku biirin siman;

(2) Sam Ritter, Google DeepMind;

(3) Blake Richards, Google DeepMind iyo McGill University & Mila;

(4) Timothy Lillicrap, Google DeepMind;

(5) Peter Conway Humphreys, Google DeepMind;

(6) Adam Santoro, Google DeepMind iyo wax ku biirin siman.

Xusuusta tifaftiraha: tani waa qaybta 1 ee 5 ee daraasadda si faahfaahsan u sharraxaysa habka looga dhigi karo moodooyinka luqadda ku salaysan beddelka ee waxtarka leh iyadoo si firfircoon loo qoondeynayo agab xisaabeed. Hoos ka akhri inta kale.

Shaxda Xiriirinta

  1. Hordhac
  2. Asalkii hore
  3. Hirgelinta Isku-dhafka-Depths Transformers
    • 3.1. Qeexida miisaaniyad xisaabeedka

    • 3.2. Ku-socoshada agagaarka blocks transformer

    • 3.3. Nidaamyada habaynta

    • 3.4. Hirgelinta dariiqa

    • 3.5. Muunad iyo 3.6. Hababka tababarka

  4. Natiijooyinka
    • 4.1. Tababarka, isbarbardhigga isoFLOP
    • 4.2. Qiimaynta dib-u-celinta tooska ah iyo 4.3. Isku-dhafka-Qoto-dheer-iyo-Khubarada (MoDE)
  5. Dood iyo Tix-raac


Moodooyinka luqadaha ku salaysan beddelka ayaa si isku mid ah u faafiyay FLOP-yada isku xigxiga gelinta. Shaqadan waxaan ku tusineynaa in transformers-ku ay bedelkeeda baran karaan in ay si firfircoon u qoondeeyaan FLOPs (ama xisaabiyaan) boosaska gaarka ah ee isku xigxiga, iyaga oo wanaajinaya qoondaynta habka isku xigxiga ee lakabyada kala duwan ee qoto dheer ee qaabka. Habkayagu waxa uu xoojiyaa wadarta xisaabinta miisaaniyada anagoo koobaya tirada calaamadaha (𝑘) ee ka qayb qaadan kara is-fiirsiga iyo xisaabinta MLP ee lakabka la bixiyay. Calaamadaha la habayn doono waxaa go'aamiya shabakadu iyadoo adeegsanaysa habka ugu sareeya-𝑘. Maadaama 𝑘 lagu qeexay mudnaanta, nidaamkan fudud wuxuu isticmaalaa garaaf xisaabeed taagan oo leh cabbirro tensor ah oo la yaqaan, si ka duwan farsamooyinka xisaabinta shuruudaha kale. Si kastaba ha ahaatee, mar haddii aqoonsiga calaamadaha 𝑘 ay dareere yihiin, habkani wuxuu u isticmaali karaa FLOPs si aan caadi ahayn wakhtiga oo dhan iyo qaabka qoto dheer ee cabbirrada. Haddaba, kharashyada xisaabinta ayaa gebi ahaan la saadaalin karaa wadarta guud, laakiin firfircoon iyo macnaha guud ee heerka calaamadda. Kaliya maaha in moodooyinka sidan loo tababaray ay bartaan inay si firfircooni ah u qoondeeyaan xisaabinta, waxay u sameeyaan si hufan. Moodooyinkani waxay ku habboon yihiin waxqabadka aasaasiga ah ee u dhigma FLOPS iyo saacadaha gidaarada si ay u tababaraan, laakiin waxay u baahan yihiin jajab FLOPs ka gudbista hore, waxayna kor u dhaafi karaan 50% si dhakhso ah si loo talaabsado inta lagu jiro muunada tababarka kadib.

1. Hordhac

Dhibaatooyinka oo dhan uma baahna waqti isku mid ah ama dadaal si loo xalliyo. Si la mid ah, qaabaynta luqadda dhammaan calaamadaha iyo taxanaha uma baahna waqti isku mid ah ama dadaal si sax ah loo sameeyo saadaalin. Haddana, moodooyinka transformer-ku waxay ku kharash gareeyaan isla qaddarka xisaabinta halkii calaamad ee baaska hore. Fikrad ahaan, transformers-yadu waxay isticmaali doonaan wadarta guud ee miisaaniyadaha iyaga oo aan ku bixin xisaabinta si aan loo baahnayn.


Xisaabinta shuruuda ah waa farsamo isku dayaysa in la yareeyo wadarta xisaabinta iyadoo la bixinayo kaliya marka loo baahdo (Bengio et al., 2016; Bengio, 2013; Bengio et al., 2013). Algorithmsyo kala duwan ayaa bixiya xalalka goorta iyo inta xisaabinta la isticmaalayo (Ainslie et al., 2023; Bapna et al., 2020; Fedus et al., 2022). Si kastaba ha noqotee, qaababka guud ee dhibaatadan adag ayaa laga yaabaa inaysan si fiican ula shaqeyn caqabadaha qalabka jira maadaama ay u muuqdaan inay soo bandhigaan garaafyada xisaabinta firfircoon (Dehghani et al., 2018; Graves, 2016). Hababka xisaabinta shuruudda ee ugu rajo-gelinta badan waxa laga yaabaa inay beddelkeeda noqdaan kuwa la jaan-qaadi kara qalabkayaga hadda jira, kaas oo mudnaanta siiya garaafyada xisaabinta ee taagan, iyo cabbirrada tensor-ka ee la yaqaan ee loo doortay si loo kordhiyo isticmaalka qalabka.


Halkan waxaan ku tixgelineynaa dhibaatada qaabeynta luqadda iyadoo la adeegsanayo miisaaniyad xisaabeed static ah oo laga dhigi karo wax ka yar ta loo isticmaalo beddelka vaniljka. Shabakadu waa inay barataa sida firfircoonida leh ee loo qoondeeyo xisaabinta la heli karo iyadoo la samaynayo go'aamo kasta calaamad, lakab kasta, oo ku saabsan halka lagu kharash gareeyo xisaabinta miisaaniyada la hayo. Hirgelinteena wadarta xisaabinta waxaa lagu qeexay isticmaalaha oo aan isbeddelin ka hor tababarka, halkii ay ka ahaan lahayd shaqeynta go'aamada duulimaadka ee shabakada. Sidaa darteed, faa'iidooyinka waxtarka qalabka-sida raad xusuusta oo yaraatay, ama FLOP-yada la dhimay ee gudbintii hore - waa la sii saadaalin karaa oo laga faa'iidaysan karaa waqti ka hor. Sida aan muujin doono, guulahan waxaa la heli karaa iyada oo aan la hurin waxqabadka guud.


Waxaan ka faa'ideysaneynaa hab la mid ah isku dhafka khubarada (MoE), kaas oo go'aamo heer-toosin ah oo firfircoon laga gaarayo qoto dheer ee shabakadda. Ka tegitaanka MoE, waxaan dooranay in aan isticmaalno xisaabinta calaamad (sida ay noqon doonto kiiska beddelka caadiga ah), ama aan ka gudubno xiriirinta hadhaaga (oo aan isbeddelin iyo xisaabinta kaydinta). Sido kale si ka duwan MoE, waxaan u adeegsanaa dajintan labadaba MLP-yada hore iyo dareenka madax-badan. Maadaama tani ay sidoo kale saameynayso furayaasha iyo weydiimaha aan farsameyno, dariiqa ayaa go'aamo ka gaartaa kaliya maaha calaamadaha la cusboonaysiinayo, laakiin sidoo kale calaamadaha la diyaariyo si looga qayb galo. Waxaan u tixraacaynaa istiraatiijiyaddan sida Isku-dhafka-Qoto-dheerta (MoD) si aan xoogga u saarno sida calaamaduhu u dhex maraan tirooyin kala duwan oo lakabyo ah, ama blocks, iyada oo loo marayo qoto dheer ee beddelka (eeg jaantuska 1).


Farsamada MoD waxay sidoo kale u oggolaanaysaa mid ka mid ah inuu ku beddelo waxqabadka xawaaraha. Dhinaca kale, qofku wuxuu tababbari karaa transformer-ka MoD kaas oo ku wanaajinaya beddelaadaha vaniljka ilaa 1.5% ujeedada tababbarka itimaalka log ugu dambeeya ee FLOPs tababbarka u dhigma (isoFLOP), iyo iyada oo la qaadanayo qadar u dhigma wakhtiga gidaarka si loo tababaro. Dhanka kale, qofku wuxuu tababari karaa transformer-ka MoD kaas oo ku guulaysta sinnaanta luminta tababarka ee isweydaarsiga vaniljka ugu fiican ee isoFLOP, laakiin isticmaala qayb ka mid ah FLOPs (kor u kaca 50%) gudbista hore, oo markaa aad u dhaqso badan in la talaabsado. Isku soo wada duuboo, natiijooyinkani waxay muujinayaan in Transformers MoD ay bartaan inay si caqli gal ah u maraan (tusaale, ka boodi xisaabaadka aan loo baahnayn) maadaama ay gaari karaan ixtimaalka log siman ama ka wanaagsan ee isku xigxiga in kasta oo FLOP yar oo baasbas hore ah.


Warqadan waxaa laga heli karaa arxiv iyadoo la raacayo shatiga CC BY 4.0 DEED.