180 ukufundwa Umlando omusha

Ukubuyekezwa kwe-LLMs ku-Solving Leetcode Problems ngo-2025

nge Alex Svetkin9m2025/04/08
Read on Terminal Reader

Kude kakhulu; Uzofunda

I-LLM yesimo sobuciko "yokubonisana" ingaxazulula inani elikhulu lezinkinga ze-algorithmic ze-Leetcode ezinzima.
featured image - Ukubuyekezwa kwe-LLMs ku-Solving Leetcode Problems ngo-2025
Alex Svetkin HackerNoon profile picture
0-item
1-item

Inyanga eyadlulayo, my benchmark ibonise ukuthi I-Large Language Models (LLMs) ingatholele izinzuzo ze-algorithmic encoding ku-Leetcode. Kodwa-ke, ikhono yabo lithathwe ku-subset ye- "i-popular" ezaziwa kakhulu. Izinzuzo ezintsha, okuyiziphi na ingxenye yayo yedatha yobuchwepheshe, zihlanganisa izinzuzo. Nakuba izinzuzo ezizodwa zihlanganisa kakhulu nge-models, izinzuzo ezinzima zihlanganisa.

my benchmarkI-benchmark yam

Kwangoku, i-Open AI, i-Anthropic, ne-Google ziye zithunyelwe ama-versions ezihlangene nezimodeli zayo futhi abadlali ezintsha ezifana ne-Deepseek ne-xAI ziye zihlanganisa. Izimodeli eziningi zithunyelwe manje njengezinto ezivamile zokubhala ikhowudi, okuyinto akuyona ngempumelelo ngaphambi. Ngingathanda ukuhlaziywa lezi zangaphambili ze-LLMs ukuze ufunde ukuthi ikhono yabo yokuxhumana nezinkinga ezintsha ze-algorithm iyahambisana noma akuyona.

Ukulungiswa

Kukho izici zokusebenza zokusebenza zokusebenza zokusebenza kwama-LLM.

I-SWE-bench isekelwe ukulungiselela imibuzo ye-software emangalisayo - isekelwe emibuzo ye-Github ye-open-source amaphrojekthi ezikhoyo. Kuyinto umqondo obumangalisayo, kodwa ibandakanya izinto eziningi ngaphandle kokuphendula kwebhizinisi lokwenene le-algorithmic lapho ngifanele.

SWE-benchSWE-bench

I-Codeforces ama-benchmarks asebenza kahle kakhulu ekubunjweni kwe-algorithmic problem-solving skills ye-LLMs. I-OpenAI iboniswe amamodeli ye-o1 ne-o3 ku-Codeforces ama-problems futhi iboniswe imiphumela enhle (1, 2), ngenkathi amanye ama-competitors akuyona. Lokhu kwenziwa ukuhlangabezana ngqo.I-CodeforcesI-Codeforces1122

Iziqondiso lithunyelwe ekukhiqizeni isigaba esitsha, okuvumela ukuguqulwa okuqondile kwe-LLMs. Futhi, ngemuva kwalokho, ngoko ke akuyona nje ngenxa yokudlala?

I-benchmark design

I-idea kuyinto ukulayisha imiphumela yabantu ngesikhathi sokuphendula imiphumela ye-algorithmic kodwa usebenzisa i-LLM ukudala ikhodi:

  1. Download ukubuyekezwa kwebhizinisi.
  2. Yenza isibuyekezo kusuka ku-description.
  3. Yenza ikhodi nge-LLM.
  4. Yenza ikhodi ukuvalwa.
  5. Await results.
  • Download ukubuyekezwa kwebhizinisi.
  • Ukhiqiza umbhalo kusuka ku-description.
  • Generate ikhodi nge LLM.
  • Iphumela ikhodi yokubhalisa.
  • Wagcina imiphumela.

  • Sketched imiphumela ye benchmark

    Sketched imiphumela ye benchmark


    Lezi zinyathelo kufanele ifakwe ngamunye ingxaki e-test set yethu futhi ngamunye LLM. Ngenxa yobumfihlo, kunezinguquko eyodwa kuphela ye-LLM ukudlala ikhodi ngamunye ingxaki, ngaphandle kokubili okungenani ukuguqulwa ukuze kubuyekeze isixazululo. Zonke imiphumela isetshenziswe njengezinhlangano; akukho isixhumanisi esihlalweni phakathi kwabo.

    Ngena ngemvume


    Why Leetcode?

    I-Leetcode yakhelwe ukhetho olungcono yokubambisana izizathu eziningana:

    • Imibuzo ye-Leetcode isetshenziselwa izivakashi zayo zayo zokusebenza nezimo ze-software engineer.
    • Abacwaningi be-Computer Science abacwaningi abacwaningi abacwaningi be-similar problems during their education.
    • It has a online judge that can check if the solution is correct in seconds.
    • I-Many popular programming languages are supported.
    • Umsebenzisi we-human performance on this problem is also available.
  • Imibuzo ye-leetcode isetshenziselwa izivakashi zokusebenza zokusebenza kwezinto ze-software.
  • Abacwaningi be-Computer Science abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi.
  • It has a umbhali online ukuthi kungathola ukuba isisombululo kuyinto efanelekayo ngesikhathi imizuzu.
  • Izilimi eziningi ezivamile zokusebenza zokusekelwe.
  • Ukhiqizi we-Human on this problem is also available.

  • How Leetcode isebenza

    Uma unemibuzo ye-competitive programming noma isixazululo se-algorithmic, apha isifundo esifushane. Qaphela lokhu isampula yokubonisa inkinga:

    Ukuye isixazululo esithile futhi isixazululo esithile, i-indices ye izigidi ezimbini ziye zihlanganisa ku-target. Ungathanda ukuthi yonke inguqulo iya kuba isixazululo esifanayo, futhi ungasebenzisa isixazululo esifanayo isixazululo esithile. 
    Ukuye isixazululo se-integral kanye ne-target ye-integral, ukuguqulwa ama-indices ye-2 nombhalo ukuze zihlanganise ku-target. Ungathanda ukuthi ngamunye ukufinyelela kuyinto isixazululo esifanayo, futhi ungasebenzisa isibonelo esifanayo.

    I-compete programmer kufanele ushiye isicelo esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni:

    class Solution:    def twoSum(self, nums: List[int], target: int) -> List[int]:       # ikhodi yakho lapha  
    class Solution:    def twoSum(self, nums: List[int], target: int) -> List[int]:      # ikhodi yakho lapha 

    Ukusebenza ngokuvamile, amamodeli eziningana ze-input kanye ne-output (i-test cases), iboniswe ku-description:

    Input:  nums = [2,7,11,15], target = 9  Output: [0,1]   
    Input:  nums = [2,7,11,15], target = 9  Output: [0,1]  

    I-problem ingaba amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi am

    Wonke ingxaki unayo "ukushintshwa kwama-akceptance", i-ratio ye-solutions eyenziwa kumasebenzisi we-Leetcode. Qaphela ukuthi umsebenzisi ngamunye angakwazi ukulayisha ikhodi yayo inombolo olungaphakathi, futhi ngamunye ukucinga ku-akceptance rate.

    Lezi zinsizakalo akuyona Leetcode; ziye zisetshenziselwa ngokuvamile izivakashi zobuchwepheshe ngexesha elide.


    I-Data Set

    Nge-benchmark edlule, ngingathanda ukuqhuma i-LLMs kumahlobo ezimbili we-problems:

    • Izinkinga ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa.
  • Imibuzo "ezaziwayo" akuyona kuphela edlule, kodwa futhi isetshenziselwa imibuzo ye-software - ngakho-ke, izixazululo zitholakala kakhulu.
  • Imibuzo ye-"Unseen" eyenziwe ngonyaka esidlulile, futhi izixazululo zayo akuyona engatholakala ku-LLM ezivamile.
  • Ukuba amanye amaphrojekthi zihlanganisa ngokucacileyo futhi kuncike ukwandisa umsebenzi oluthile nge-code, ezinye zihlanganisa ukufaka isixhumanisi, i-i.e., ukwandisa umsebenzi eziningi ku-problem enye. Abanye amaphrojekthi zihlanganisa kanye nezithombe, okuyinto kungabangela izinzuzo ze-LLMs, njengoba amamodeli amancane afanelekile ukufinyelela imifanekiso noma ukucubungula inthanethi. Ngitholile ukunikezela imiphumela emaphrojekthi, ama-links, kanye nama-imiphumo amaphrojekthi eziningana.

    Leetcode inikeza izilimi ezintathu zokusebenza: "Leetcode 75", futhi "Top interview 150", "Leetcode 75", futhi "Top 100 Likes" My dataset of "inkinga ez"Top ukubuyekeza 150""Leetcode 75""Top 100 Likes"

    Ukuhlola imiphumela ye-"invisible", i-problem ye-99 etholakalayo kakhulu: i-33 easy, i-33 medium, ne-33 hard. Ukukhishwa kwe-problem IDs, okuyinto encane. Nangona i-Leetcode ayibonisa isikhathi sokubhalisa imiphumela, kungenziwa ngokuvamile kusuka ku-comments ne-discussions. Imiphumela yokuqala ye-"invisible" iyatholakala ngokuvamile ngoNovemba 2024.

    I-difficulty levels yi-subjective kuphela futhi ku-editor's discretion. Ngingathanda ukujabulela inani lwezinkinga ngamunye noma i-dataset.




    I-problem set

    Ukulungiselela i-akhawunti yakho


    Ukulungiselela i-akhawunti yakho


    I-problem set

    I-Problem Set

    I-Problem Set


    Wonke Amalungelo Agodliwe

    Unseen
    (23 Mar 2025)

    Iphrofayili ye-Imunify


    Wagqibelele

    Wagqibelele

    Well-Known Wonke amalungelo agcinwe

    Unseen
    (23 Mar 2025)

    I-Unseen
    (23 Mar 2019)

    Unseen
    (23 Mar 2025)
    Ngena ngemvume

    Wonke

    133

    99

    Imininingwane

    Imininingwane

    Imininingwane

    33

    134

    99

    99

    Easy

    41

    33

    Ukulungiselela

    Ukulinganisa

    UkulungiselelaIphrofayili ye-Imunify

    41

    33

    33

    Medium

    78

    33

    Imininingwane

    Imininingwane

    Imininingwane

    78



    78

    33

    33

    Hard

    14

    33

    I-Hard

    Ukuhlobisa

    Ukuhlobisa

    4

    4

    33

    33

    Ukulinganiswa kwamakhasimende we-Leetcode

    53.44%

    37,05%

    I-Acceptance Rate ye-Leetcode abasebenzisi

    I-Acceptance Rate ye-Leetcode abasebenzisi

    Ukuhlukaniswa kwamakhasimende we-Leetcode

    53.44%

    53.44%

    53.44%

    37,05%

    37,05%

    37,05%

    I-problem statements kanye ne-code snippets zitholakala ngokusebenzisa isixhobo yami ye-benchmarking, ebhalwe ku-Github: https://github.com/whisk/leetgptsolver

    https://github.com/whisk/leetgptsolverhttps://github.com/whisk/leetgptsolver


    I-Prompt, ukhetho lwezilimi kanye nesakhiwo se-code

    I-benchmark yenzelwe ngokuvamile: I-LLM inikeza kuphela inqubo yokwenza ikhodi ngaphandle kokuthunyelwe kwebhizinisi lokuqala (noma ezinye izimo) futhi ngaphandle kokufunda izimo zokusebenza, ngaphandle kwezinye izimo ezivela ku-description ngokuvamile.

    Wasebenzisa isicelo esifanayo kuzo zonke i-LLM kanye nama-problem:

    Hello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekeze ingxubevange ezinkulu uma akayiwe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function signatures, classes, or method names. * Output only valid source codeHello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekele amaphuzu amakhulu uma akukho isifinywe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is the problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function, class names, or method names. * Output only valid source code that can

    I-prompt was “polished” nge-ChatGPT4 kusukela ku-draft yami yokuqala, kodwa ngaphandle kokusebenza kwezinto ze-”prompt-engineering”.

    Ukuhlukaniswa kwebhizinisi kubhalwe ama-HTML tags ngaphambi kokusebenzisa ku-prompt.

    Ukuhlolwa kwebhizinisi le-Python (i-version 3).

    I-LLM yakhelwe ukukhipha kuphela ikhodi yokusebenza ngaphandle kokubili ye-text, kodwa lokhu akuyona kakhulu. Isizukulwane esisodwa sokusungulwa, futhi konke ngaphandle kokusebenza kwe-code yakhelwe futhi ayatholakala.


    Amamodeli kanye nemikhiqizo

    Amamodeli abasetyenziswa kwama-benchmark zihlanganiswe embhedeni elandelayo, nomunye ama-parametres angaphandle kwe-default zihlanganiswa. Iziqu ze-knowledge cutoff izinsuku zihlanganiswa embhedeni yomthengisi yomthengisi futhi zihlanganiswa, uma itholakala.


    I-Vendor

    I-Model

    I-Knowledge cutoff date










    Okuningi

    Iphakheji

    Iphakheji

    I-Model

    I-Model

    Ukulungiselela

    I-Knowledge cutoff date

    Imininingwane yokuhlanza idatha

    I-Knowledge cutoff usuku

    "Ukulungiselela"

    "Ukulungiswa"

    "Ukulungiselela"

    I-Parameter

    I-Parameters

    I-Parameters

    I-Anthropic

    claw-3-7-sonnet-20250219

    Nov 2024

    No

    temperature = 0.0 max_tokens = 4096

    I-Anthropic

    I-Anthropic

    isithombe-3-7-sonnet-20250219

    I-sonnet-3-7-20250219

    umthombo-3-7-sonnet-20250219

    Novemba 2024

    Novemba 2024

    Khona

    Khona

    izinga lokushisa = 0.0 max_tokens = 4096

    izinga lokushisa = 0.0 max_tokens = 4096


    I-claude-3-7-sonnet-20250219  (ngokusebenza ngokucacisa)

    Nov 2024

    Yes

    temperature = 0.0 max_tokens = 16384 budget_tokens = 8192

    Iphrofayili ye-Imunify


    I-claude-3-7-sonnet-20250219  (ngokusebenza ngokucophelela)

    umthombo we-3-7-sonnet-20250219  (ngokusebenza ngokucacisa)

    umthombo-3-7-sonnet-20250219 (ngokusho kuhloswe)

    Novemba 2024

    Novemba 2024

    Yes

    Yes

    izinga = 0.0 max_tokens = 16384 budget_tokens = 8192

    izinga = 0.0 max_tokens = 16384 budget_tokens = 8192

    DeepSeek

    deepseek-chat (DeepSeek-V3)

    unknown

    No

    temperature = 0.0

    I-DeepSeek

    I-DeepSeek

    epseek-chat (DeepSeek-V3)

    I-deepseek-chat (DeepSeek-V3)

    ukudluliselwa-chat(DeepSeek-V3)

    ukudluliselwa

    izithombe nezithombe

    Khona

    Khona

    umthamo = 0.0

    izinga lokushisa = 0.0


    epseek-reasoner (DeepSeek-R1)

    unknown

    Yes

    temperature = 0.0

    Iphrofayili ye-Imunify


    I-deepseek-reasoner (I-DeepSeek-R1)

    I-deepseek-reasoner (I-DeepSeek-R1)

    I-deepseek-reasoner (I-DeepSeek-R1)

    ukudluliselwa

    izithombe nezithombe

    Yes

    Yes

    umthamo = 0.0

    izinga lokushisa = 0.0

    Google

    gemini-2.0-flash-001

    unknown

    No

    temperature = 0.0

    I-Google

    I-Google

    izindwangu-2.0-flash-001

    ukugqoka-2.0-flash-001

    umzila-2.0-flash-001

    ukudluliselwa

    izithombe nezithombe

    Khona

    Khona

    umthamo = 0.0

    izinga lokushisa = 0.0


    izindiza-2.0-pro-exp-02-05

    unknown

    No

    temperature = 0.0

    Iphrofayili ye-Imunify


    izindwangu-2.0-pro-exp-02-05

    ukugqoka-2.0-pro-exp-02-05

    ukugqoka-2.0-pro-exp-02-05

    ukudluliselwa

    izithombe nezithombe

    Khona

    Khona

    umthamo = 0.0

    izinga lokushisa = 0.0


    izindiza-2.5-pro-exp-03-25

    unknown

    Yes

    temperature = 0.0

    Iphrofayili ye-Imunify


    izindiza-2.5-pro-exp-03-25

    ukugqoka-2.5-pro-exp-03-25

    mali-2.5-pro-exp-03-25

    ukudluliselwa

    izithombe nezithombe

    Yes

    Yes

    umthamo = 0.0

    izinga lokushisa = 0.0

    xAI

    grok-2-1212

    Julayi 17, 2024

    No

    p>emali = 42

    xAI



    XAU

    I-Grok-2-1212

    I-Grok-2-1212

    Grok-2-1212

    Julayi 17, 2024

    Julayi 17, 2024

    Khona

    Khona

    izithombe = 42

    izithombe = 42

    OpenAI

    o1-2024-12-17

    Oct 01, 2023

    Yes

    semi = 42

    OpenAI

    OpenAI

    o1-2024-12-17

    o1-2024-12-17

    o1-2024-12-17

    Oct 01, 2023

    Oct 01, 2023

    Yes

    Yes

    izithombe = 42

    izithombe = 42


    o3-mini-2025-01-31

    Oct 01, 2023

    Yes

    semi = 42

    Iphrofayili ye-Imunify


    o3-mini-2025-01-31

    o3-mini-2025-01-31

    o3-mini-2025-01-31

    Oct 01, 2023

    Oct 01, 2019

    Yes

    Yes

    izithombe = 42

    izithombe = 42

    I-benchmark yenzelwe ukuba kuyinto enhle kakhulu futhi enhle kakhulu; Ngakho-ke, ama-parameters efana ne-"temperature" noma "seed" zisetshenziswe. Nokho, akuyona amamodeli e-tested akufanele ukukhipha ngokugcwele kwe-deterministic. Lokhu umphakeli kufanele kusetshenziswe ngesikhathi sokushesha ngezifundo zayo.

    Zonke izinsuku zokugqoka kwezobuchwepheshe ezaziwayo zitholakala ngaphambi kwe-problem enhle ye-dataset ezaziwayo (Novemba 2024). Kodwa-ke, asikwazi ukuthola izinsuku zokugqoka ze-Gemini ne-DeepSeek amamodeli.

    Enye amamodeli inikeza imodi ye-"Rasoning" noma "Thinking" ngokuzimela, kanti ku-Claude 3.7 Sonnet kungenziwa ngokuvumela ama-parameter. Ukusetshenziswa kwalo umsebenzi kubhalwe embhedeni. Ezinye izici ze-model (noma ama-"instruments") njenge-web search ayikwazi, ngisho nangokuxhaswa.

    Iziphumo

    Results on the "wel-known" problem set

    Results on the "wel-known" problem set


    Wonke abacwaningi babonisa izinga eliphezulu kakhulu lokuphumelela kwama-problems ezaziwayo, njenge-benchmark edlule. Ngitholile amamodeli ephezulu noma ama-modification (ikakhulukazi: Claude 3.7 Sonnet with reasoning enabled, DeepSeek R1, Gemini 2.5 Pro, O1) ukuze ukugcina isikhathi futhi amaklayenti, njengoba imiphumela iyatholakala kakhulu.

    Results on the "unseen" problem set

    Results on the "unseen" problem set

    Iziphumo zihlukile kakhulu ngezinkinga ze-"invisible" ngezindlela ezimbili:

    1. For all models the acceptance rate is lower for "unseen" problems. This is especially notable for medium and hard problems.
    2. Models with "reasoning" or "thinking" mode enabled produced better results across problems of all difficulties, though exact numbers varied from model to model.
  • Wonke amamodeli izinga lokuphumula kunezinguquko ezingaphumelele kumemibuzo "ukumangaliswa". Lokhu kubalulekile ikakhulukazi kumemibuzo ephakathi nokumangaliswa.
  • I-akceptance rate iyaphakeme ngezifiso ze-"invisible"
  • Amamodeli nge "ukudlulisela" noma "ukudlulisela" imodi enikeze imiphumela engcono phezu imibuzo of zonke izinzuzo, kodwa izibalo esizayo zihlanganisa imodeli ngamodeli.
  • Amamodeli nge-"razoning" noma "thinking" mode ifakwe imiphumela engcono

    Izinga okusezingeni eliphezulu lokuthumela kwebhizinisi ezaziwayo ingatholakala ngokuvamile ukuthi ama-problems kanye nemisombululo zabo zihlanganisa ku-trening kits, futhi amamodeli akuyona kuphela ukuguqulwa kwesisombululo esaziwayo eseceleni. Nokho, izinga lokuthumela abasebenzisi kwebhizinisi ezintsha ezingu-medium kanye ne-hard kuyinto engaphansi kunokuba kwebhizinisi e-"okuthandayo" set. Lezi zimo zihlanganisa, futhi akudingeki ukuthi ama-problems ezintsha "ezinzima". Ukushintshwa kwe-difficulty, njenge-ngaphambili, i-subjective kakhulu. Futhi, njenge-LLMs ngokuvamile, abasebenzisi abantu angakwazi ukunikeza izixaz

    Zonke amamodeli nge-"reasoning" mode ibonisa imiphumela enhle kunazo zonke amamodeli eziyinhloko. I-most importantly, ezinye amamodeli abakwazi ukuguqulwa inani elikhulu ye-middle ne-hard problems - umphumela engatholakali kuphela eminyakeni edlule. I-o3-mini ibonisa imiphumela enhle phakathi kwezinye amodeli ze-"reasoning" - ibonise futhi engcono kunazo zonke ama-o1 ezinzima kakhulu futhi engaphakeme kakhulu. Kubalulekile ukunceda ukuthi o3-mini iboniswa ngokufanelekileyo ukuguqulwa imibuzo yokuhlanganisa imKungcono kakhulu, ezinye iziqu zehlabathi ukulawula inani elikhulu lwezinkinga ezincinane kanye nezinkinga ezincinaneO3-mini was specifically trainedumthombo we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-

    Imininingwane ezidlulayo

    Kungabikho ukulungiswa ukuthi ama-problems 'ukungabonakaliwa' asekelwe ku-training data yama-models. Ukuze ukuguqulwa lokhu, singathanda ukuvelisa ama-problems ezintsha, ezizodwa ezizenziselwa ama-benchmarks ezidlulayo - ngokuvamile, usebenzisa i-LLMs.

    Ngaphezu kwalokho, isinyathelo esinye kuyinto ukusetshenziswa ngezilimi zokusetshenziswa kakhulu zokusebenza. Le ndlela kungenzeka ukuthi LLMs zihlanganisa isixazululo kunokuba "copy-pasting" ikhodi esaziwayo efanelekayo ePython.

    Lezi izakhiwo zihlangene izifundo ezilandelayo, futhi ngithanda ukuthi abanye noma ngiyakwazi ukujabulela.

    Imininingwane

    https://github.com/whisk-on-solving-leetcode-problems">https://github.com/whisk-on-leetgptsolver I-benchmark yayo edlule imiphumela (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems
  • I-Raw imiphumela, ama-problem sets, ne-source code ingatholakala ku-GitHub yami:  https://github.com/whisk/leetgptsolver
  • https://github.com/whisk/leetgptsolverhttps://github.com/whisk/leetgptsolver
  • Imiphumela yayo yokuqala ye-benchmark (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems
  • https://hackernoon.com/testing-llms-on-solving-leetcode-problemshttps://hackernoon.com/testing-llms-on-solving-leetcode-problems


    Cover image eyakhiwe ngu DALL·E.

    Trending Topics

    blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks