paint-brush
Nigute Uhuza Inzovu murupapuro rusesuyena@scripting
Amateka mashya

Nigute Uhuza Inzovu murupapuro rusesuye

na Scripting6m2025/02/20
Read on Terminal Reader

Birebire cyane; Gusoma

Gukusanya imibare minini iratinda, ariko gukusanya amakuru ubanza birashobora gufasha. Turasaba uburyo-bwiza bwo gukoresha k-uburyo na k-median cluster ikora hafi byihuse nko gusoma amakuru. Mugihe uburyo bworoshye bwo gutoranya bubaho, burashobora gutakaza igihombo cyukuri, bigatuma inzira yacu iringaniza neza hagati yumuvuduko no kwizerwa.
featured image - Nigute Uhuza Inzovu murupapuro rusesuye
Scripting HackerNoon profile picture
0-item

Abanditsi:

(1) Andrew Draganov, kaminuza ya Aarhus n'abanditsi bose bagize uruhare runini muri ubu bushakashatsi;

(2) David Saulpic, Université Paris Cité & CNRS;

(3) Chris Schwiegelshohn, Kaminuza ya Aarhus.

Imbonerahamwe

Ibisobanuro na 1 Intangiriro

2 Ibibanziriza hamwe nakazi kajyanye

2.1 Ku ngamba zo gutoranya

2.2 Izindi ngamba za Coreset

2.3 Coresets ya Ububikoshingiro Porogaramu

2.4 Ibishushanyo bya Quadtree

3 Byihuta

4 Kugabanya Ingaruka Zikwirakwizwa

4.1 Kubara ibicuruzwa bitarenze urugero

4.2 Kuva Mubisubizo Bigereranijwe Kugabanuka Gukwirakwizwa

5 Kwihuta Kwihuta Mubikorwa

5.1 Intego nubunini bwisesengura rifatika

5.2 Gushiraho Ubushakashatsi

5.3 Gusuzuma Ingamba zo Gutoranya

5.4 Gushiraho Inzira na 5.5

6 Umwanzuro

7 Gushimira

8 Ibihamya, Pseudo-Kode, hamwe no Kwagura na 8.1 Icyemezo cyerekana 3.2

8.2 Kugabanya k-bisobanura kuri k-median

8.3 Kugereranya Igiciro Cyiza Mubiti

8.4 Kwaguka kuri Algorithm 1

Reba

Ibisobanuro

Twiga theoretique na pratique yimikorere ya k-uburyo na k-mediani ihurira kumibare minini. Kubera ko uburyo bwiza bwo guhuriza hamwe buhoro buhoro kuruta igihe bifata cyo gusoma dataset, uburyo bwihuse ni uguhagarika vuba amakuru no gukora cluster kuri compression ihagarariwe. Kubwamahirwe make, ntamahitamo meza yo kugabanya umubare wamanota - mugihe icyitegererezo cyatoranijwe gikora mugihe gito kandi coresets itanga garanti yingirakamaro, iyambere ntabwo yubahiriza ukuri mugihe iyanyuma itinda cyane nkuko umubare wamanota hamwe na cluster byiyongera. Mubyukuri, byafashwe umwanzuro ko ibyiyumvo byose bishingiye kuri coreset kubaka bisaba umwanya-muremure mugihe cya dataset.


Turasuzuma iyi sano tubanze twerekane ko hariho algorithm ibona coresets ikoresheje sensibilité sampling mugihe cyiza - mugihe cyibintu byigihe bisaba gusoma amakuru. Uburyo ubwo aribwo bwose butera imbere cyane kuri ibi bigomba noneho kwifashisha ibikorwa bya heuristique bifatika, bikatugeza ku gusuzuma ingamba zifatika zifatika zifatika zifatika kandi zifatika muburyo bwa static na streaming. Binyuze muribi, twerekana imiterere aho coresets zikenewe mukuzigama cluster yemewe kimwe nigenamiterere aho byihuse, cruder sampling strategies irahagije. Nkigisubizo, dutanga igishushanyo mbonera cyuzuye kandi gifatika cyo gushushanya neza tutitaye ku bunini bwamakuru. Kode yacu iraboneka kumugaragaro kandi ifite inyandiko zo gukora ubushakashatsi.

1 Intangiriro

Abasesenguzi ba kijyambere ntibafite ikibazo cyo guhuza algorithms guhitamo ariko, urebye ubunini bugenda bwiyongera bwimibare ifatika, benshi usanga batinda cyane kugirango bibe ingirakamaro. Ibi birakenewe cyane cyane kumiyoboro minini-yamakuru, aho algorithms ikusanya ikoreshwa mugusenyuka. Intego ni ugusimbuza dataset nini cyane na ntoya, irashobora gucungwa neza kumurimo wo hasi, hamwe nicyizere ko ihagarariye umwimerere neza. Algorithm ya Lloyd [49] yatangijwe kubwiyi mpamvu kandi igabanya ikosa ryo kugereranya - igiteranyo cyintera ya kare kuva kuri buri cyerekezo cyinjira kugeza kuyihagarariye muri dataset yapimwe. Birashoboka ko algorithm ikunzwe cyane, Lloyd yiruka kubisubiramo byinshi kugeza ihujwe na buri itera isaba O (ndk) umwanya, aho n numubare wamanota, d numubare wibiranga na k numubare wamatsinda - cyangwa ubunini bwa compression. Kubisabwa nkibi, umubare wamanota urashobora kuba byoroshye miriyoni amagana kandi, kubera ko ubwiza bwa compression bwiyongera hamwe na k, intego zisanzwe zishobora kugira k mubihumbi [41, 4]. Mugihe nk'iki, O (ndk) algorithm iyo ari yo yose birabuza buhoro.


Ingero nkizi zatumye izamuka ryamakuru manini ya algorithms atanga ibitekerezo byombi nibikorwa bifatika. Ibitekerezo byuburyo bwiza hamwe nibikorwa bifatika, ariko, akenshi usanga bitumvikana. Ku ruhande rumwe, garanti garanti itanga ibyiringiro ko algorithm izakora utitaye kubintu byose byamahirwe yakira. Kurundi ruhande, birashobora kugorana kwiyemeza gushyira mubikorwa algorithm yuburyo bwiza mugihe hari uburyo bwihuse bwihuta bwo kwiruka no gukora neza mubikorwa.


Kubera ko imibare ishobora kuba nini mumibare y amanota n na / cyangwa umubare wibiranga d, uburyo bunini-bwamakuru bugomba kugabanya ingaruka zombi. Kubyerekeranye n'umwanya uranga, ikibazo kirafunzwe neza kuko ibishushanyo mbonera byihuta (bikora mugihe cyumurongo ugaragara), bifatika kubishyira mubikorwa [50], kandi bigatanga garanti zikomeye kubunini no gushyiramo. Icyerekezo ntigisobanutse neza mugihe ugabanya umubare w amanota n, kandi hariho paradigima ebyiri zitandukanye buriwese atanga inyungu zitandukanye. Ku ruhande rumwe, dufite icyitegererezo kimwe, gikora mugihe gito ariko gishobora kubura igice cyingenzi cyamakuru bityo kikaba gishobora kwemeza gusa ukuri kubitekerezo bimwe na bimwe bifatika ku makuru [45]. Ku rundi ruhande, ingamba zifatika zifatika zitanga ingwate ikomeye ya coreset, aho ikiguzi cyigisubizo icyo ari cyo cyose ku makuru yegeranye kiri mu ε-kintu cy’igiciro cy’igisubizo kuri dataset yambere [25].


Intererano zacu Twiga paradigima zombi (icyitegererezo kimwe hamwe na coresets zikomeye) kubijyanye nikibazo cya kera - kwikuramo k-uburyo n'intego za k-median. Mugihe icyitegererezo kimwe gitanga umuvuduko mwiza ariko nta garanti-yuzuye yerekana neza, inyubako zose za coreset zubaka zifite igihe cyo gukora byibuze Ω˜ (nd + nk) mugihe zitanga imipaka ntarengwa kumubare ntarengwa w'icyitegererezo gisabwa kugirango ugabanye neza.


Biroroshye kwerekana ko algorithm iyo ari yo yose igera ku garanti yo guhagarika igomba gusoma imibare yose [1]. Ikibazo gisobanutse neza nicyo garanti zagerwaho mugihe cyumurongo cyangwa hafi-umurongo. Mubyukuri, kuri ubu kuboneka byihuse byerekana algorithms zo gukusanya [6, 5] ntibishobora kugera kuri garanti ikomeye. Vuba aha, [31] yatanze uburyo bwa coresets zikomeye zikoresha mugihe cya O˜ (nd + nk) hanyuma zitekereza ko ari byiza kuri k-median na k-buryo.


Mugihe iyi mipaka ari nziza cyane kubiciro bito bya k, hariho progaramu nyinshi nko kureba mudasobwa [34] cyangwa algorithmic fairness [18] aho umubare wamatsinda ushobora kuba munini kuruta umubare wibiranga ukurikije amategeko menshi yubunini. Mugihe nk'iki, ikibazo cyigihe-cyiza coresets gikomeza gufungura. Kuva ikibazo cyo kumenya coreset yubunini bwiza giherutse gufungwa [25, 28, 44], twavuga ko aricyo kibazo nyamukuru gifunguye mubushakashatsi bwa coreset kubushakashatsi bushingiye kumurongo. Ibi turabikemura twerekana ko hariho byoroshye-gushyira mubikorwa algorithm yubaka coresets mugihe cya O˜ (nd) - gusa ibintu bya logarithmic kure yigihe bifata cyo gusoma muri dataset.


Nubwo bimeze bityo ariko, ibi ntibimurikira neza imiterere hagati ya algorithms yo gutoranya mubikorwa. Nubwo algorithm yacu igera kumurongo wogukora neza hamwe no kwikuramo neza, birashoboka rwose ko ubundi buryo, cruder bushobora kuba nkibintu bifatika mubikorwa byose bifatika. Turabivuga muburyo bukurikira mubibazo bikurikira: Ni ryari k-uburyo bwiza na k-median coresets bikenewe kandi nubuhe buryo bufatika hagati yumuvuduko wa coreset nukuri?


o subiza ibi, dukora igereranya ryuzuye murwego rwuzuye rwa algorithms yihuta kuruta uburyo twasabye. Binyuze muri ibi turagenzura ko, mugihe ubu buryo bwihuse bwuzuye neza kuri datasets nyinshi zifatika, hariho ikwirakwizwa ryamakuru ritera kunanirwa gukabije kuri buri kimwe muri byo. Mubyukuri, izi manza zirashobora kwirindwa gusa nuburyo bukomeye-coreset. Kubwibyo, mugihe ibintu byinshi bifatika bidasaba garanti yuzuye ya coreset, umuntu ntashobora guca inguni niba ashaka kwigirira ikizere muri compression. Turagenzura ko ibi bigera kuri paradigmme kandi bigakoreshwa muburyo bwo guhuza ibitekerezo.


Muri make, imisanzu yacu niyi ikurikira:


• Twerekana ko umuntu ashobora kubona coresets zikomeye kuri k-bisobanura na k-median mugihe cya O˜ (nd). Ibi bikemura igitekerezo gikenewe mugihe gikenewe kuri k-bisobanura coresets [31] kandi nibyiza muburyo bwiza kugeza log-ibintu.


• Binyuze mu isesengura ryuzuye kuri datasets, imirimo, hamwe na paradigima zitambuka / zitagendagenda neza, turagenzura ko hariho ubucuruzi bukenewe hagati yumuvuduko nukuri muburyo bwo gutoranya umurongo- na sublinear-time. Ibi biha abitoza gukora igishushanyo mbonera mugihe cyo gukoresha buri compression algorithm kugirango ibisubizo biboneye mugihe cyihuse gishoboka.


Uru rupapuro ruboneka kuri arxiv munsi ya CC BY 4.0 DEED.