Inyanga eyadlulayo,
Kwangoku, i-Open AI, i-Anthropic, ne-Google ziye zithunyelwe ama-versions ezihlangene nezimodeli zayo futhi abadlali ezintsha ezifana ne-Deepseek ne-xAI ziye zihlanganisa. Izimodeli eziningi zithunyelwe manje njengezinto ezivamile zokubhala ikhowudi, okuyinto akuyona ngempumelelo ngaphambi. Ngingathanda ukuhlaziywa lezi zangaphambili ze-LLMs ukuze ufunde ukuthi ikhono yabo yokuxhumana nezinkinga ezintsha ze-algorithm iyahambisana noma akuyona.
Ukulungiswa
Kukho izici zokusebenza zokusebenza zokusebenza zokusebenza kwama-LLM.
Iziqondiso lithunyelwe ekukhiqizeni isigaba esitsha, okuvumela ukuguqulwa okuqondile kwe-LLMs. Futhi, ngemuva kwalokho, ngoko ke akuyona nje ngenxa yokudlala?
I-benchmark design
I-idea kuyinto ukulayisha imiphumela yabantu ngesikhathi sokuphendula imiphumela ye-algorithmic kodwa usebenzisa i-LLM ukudala ikhodi:
- Download ukubuyekezwa kwebhizinisi.
- Yenza isibuyekezo kusuka ku-description.
- Yenza ikhodi nge-LLM.
- Yenza ikhodi ukuvalwa.
- Await results.
Lezi zinyathelo kufanele ifakwe ngamunye ingxaki e-test set yethu futhi ngamunye LLM. Ngenxa yobumfihlo, kunezinguquko eyodwa kuphela ye-LLM ukudlala ikhodi ngamunye ingxaki, ngaphandle kokubili okungenani ukuguqulwa ukuze kubuyekeze isixazululo. Zonke imiphumela isetshenziswe njengezinhlangano; akukho isixhumanisi esihlalweni phakathi kwabo.
Why Leetcode?
I-Leetcode yakhelwe ukhetho olungcono yokubambisana izizathu eziningana:
- Imibuzo ye-Leetcode isetshenziselwa izivakashi zayo zayo zokusebenza nezimo ze-software engineer.
- Abacwaningi be-Computer Science abacwaningi abacwaningi abacwaningi be-similar problems during their education.
- It has a online judge that can check if the solution is correct in seconds.
- I-Many popular programming languages are supported.
- Umsebenzisi we-human performance on this problem is also available.
How Leetcode isebenza
Uma unemibuzo ye-competitive programming noma isixazululo se-algorithmic, apha isifundo esifushane. Qaphela lokhu isampula yokubonisa inkinga:
Ukuye isixazululo esithile futhi isixazululo esithile, i-indices ye izigidi ezimbini ziye zihlanganisa ku-target. Ungathanda ukuthi yonke inguqulo iya kuba isixazululo esifanayo, futhi ungasebenzisa isixazululo esifanayo isixazululo esithile.
Ukuye isixazululo se-integral kanye ne-target ye-integral, ukuguqulwa ama-indices ye-2 nombhalo ukuze zihlanganise ku-target. Ungathanda ukuthi ngamunye ukufinyelela kuyinto isixazululo esifanayo, futhi ungasebenzisa isibonelo esifanayo.
I-compete programmer kufanele ushiye isicelo esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni:
class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: # ikhodi yakho lapha
class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: # ikhodi yakho lapha
Ukusebenza ngokuvamile, amamodeli eziningana ze-input kanye ne-output (i-test cases), iboniswe ku-description:
Input: nums = [2,7,11,15], target = 9 Output: [0,1]
Input: nums = [2,7,11,15], target = 9 Output: [0,1]
I-problem ingaba amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi am
Wonke ingxaki unayo "ukushintshwa kwama-akceptance", i-ratio ye-solutions eyenziwa kumasebenzisi we-Leetcode. Qaphela ukuthi umsebenzisi ngamunye angakwazi ukulayisha ikhodi yayo inombolo olungaphakathi, futhi ngamunye ukucinga ku-akceptance rate.
Lezi zinsizakalo akuyona Leetcode; ziye zisetshenziselwa ngokuvamile izivakashi zobuchwepheshe ngexesha elide.
I-Data Set
Nge-benchmark edlule, ngingathanda ukuqhuma i-LLMs kumahlobo ezimbili we-problems:
- Izinkinga ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa.
Ukuba amanye amaphrojekthi zihlanganisa ngokucacileyo futhi kuncike ukwandisa umsebenzi oluthile nge-code, ezinye zihlanganisa ukufaka isixhumanisi, i-i.e., ukwandisa umsebenzi eziningi ku-problem enye. Abanye amaphrojekthi zihlanganisa kanye nezithombe, okuyinto kungabangela izinzuzo ze-LLMs, njengoba amamodeli amancane afanelekile ukufinyelela imifanekiso noma ukucubungula inthanethi. Ngitholile ukunikezela imiphumela emaphrojekthi, ama-links, kanye nama-imiphumo amaphrojekthi eziningana.
Leetcode inikeza izilimi ezintathu zokusebenza: "Leetcode 75", futhi "Top interview 150", "Leetcode 75", futhi "Top 100 Likes" My dataset of "inkinga ez"Top ukubuyekeza 150""Leetcode 75""Top 100 Likes"
Ukuhlola imiphumela ye-"invisible", i-problem ye-99 etholakalayo kakhulu: i-33 easy, i-33 medium, ne-33 hard. Ukukhishwa kwe-problem IDs, okuyinto encane. Nangona i-Leetcode ayibonisa isikhathi sokubhalisa imiphumela, kungenziwa ngokuvamile kusuka ku-comments ne-discussions. Imiphumela yokuqala ye-"invisible" iyatholakala ngokuvamile ngoNovemba 2024.
I-difficulty levels yi-subjective kuphela futhi ku-editor's discretion. Ngingathanda ukujabulela inani lwezinkinga ngamunye noma i-dataset.
I-problem set
I-problem set
I-Problem Set
I-Problem Set
Wonke Amalungelo Agodliwe
Unseen
(23 Mar 2025)
Wagqibelele
Wagqibelele
Well-Known Wonke amalungelo agcinweUnseen
(23 Mar 2025)
I-Unseen
(23 Mar 2019)
(23 Mar 2025)Ngena ngemvume
Wonke
133
99
Imininingwane
Imininingwane
Imininingwane33
134
99
99
Easy
41
33
Ukulungiselela
Ukulinganisa
UkulungiselelaIphrofayili ye-Imunify41
33
33
Medium
78
33
Imininingwane
Imininingwane
Imininingwane78
78
33
33
Hard
14
33
I-Hard
Ukuhlobisa
Ukuhlobisa4
4
33
33
Ukulinganiswa kwamakhasimende we-Leetcode
53.44%
37,05%
I-Acceptance Rate ye-Leetcode abasebenzisi
I-Acceptance Rate ye-Leetcode abasebenzisi
Ukuhlukaniswa kwamakhasimende we-Leetcode53.44%
53.44%
53.44%37,05%
37,05%
37,05%I-problem statements kanye ne-code snippets zitholakala ngokusebenzisa isixhobo yami ye-benchmarking, ebhalwe ku-Github:
I-Prompt, ukhetho lwezilimi kanye nesakhiwo se-code
I-benchmark yenzelwe ngokuvamile: I-LLM inikeza kuphela inqubo yokwenza ikhodi ngaphandle kokuthunyelwe kwebhizinisi lokuqala (noma ezinye izimo) futhi ngaphandle kokufunda izimo zokusebenza, ngaphandle kwezinye izimo ezivela ku-description ngokuvamile.
Wasebenzisa isicelo esifanayo kuzo zonke i-LLM kanye nama-problem:
Hello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekeze ingxubevange ezinkulu uma akayiwe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function signatures, classes, or method names. * Output only valid source codeHello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekele amaphuzu amakhulu uma akukho isifinywe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is the problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function, class names, or method names. * Output only valid source code that canI-prompt was “polished” nge-ChatGPT4 kusukela ku-draft yami yokuqala, kodwa ngaphandle kokusebenza kwezinto ze-”prompt-engineering”.
Ukuhlukaniswa kwebhizinisi kubhalwe ama-HTML tags ngaphambi kokusebenzisa ku-prompt.
Ukuhlolwa kwebhizinisi le-Python (i-version 3).
I-LLM yakhelwe ukukhipha kuphela ikhodi yokusebenza ngaphandle kokubili ye-text, kodwa lokhu akuyona kakhulu. Isizukulwane esisodwa sokusungulwa, futhi konke ngaphandle kokusebenza kwe-code yakhelwe futhi ayatholakala.
Amamodeli kanye nemikhiqizo
Amamodeli abasetyenziswa kwama-benchmark zihlanganiswe embhedeni elandelayo, nomunye ama-parametres angaphandle kwe-default zihlanganiswa. Iziqu ze-knowledge cutoff izinsuku zihlanganiswa embhedeni yomthengisi yomthengisi futhi zihlanganiswa, uma itholakala.
I-Vendor
I-Model
I-Knowledge cutoff date
Okuningi
Iphakheji
Iphakheji I-Model
I-Model
Ukulungiselela I-Knowledge cutoff date
Imininingwane yokuhlanza idatha
I-Knowledge cutoff usuku "Ukulungiselela"
"Ukulungiswa"
"Ukulungiselela" I-Parameter
I-Parameters
I-Parameters I-Anthropic
claw-3-7-sonnet-20250219
Nov 2024
No
temperature = 0.0 max_tokens = 4096
I-Anthropic
I-Anthropic
isithombe-3-7-sonnet-20250219
I-sonnet-3-7-20250219
umthombo-3-7-sonnet-20250219 Novemba 2024
Novemba 2024
Khona
Khona
izinga lokushisa = 0.0 max_tokens = 4096
izinga lokushisa = 0.0 max_tokens = 4096
I-claude-3-7-sonnet-20250219 (ngokusebenza ngokucacisa)
Nov 2024
Yes
temperature = 0.0 max_tokens = 16384 budget_tokens = 8192
Iphrofayili ye-Imunify
I-claude-3-7-sonnet-20250219 (ngokusebenza ngokucophelela)
umthombo we-3-7-sonnet-20250219 (ngokusebenza ngokucacisa)
umthombo-3-7-sonnet-20250219 (ngokusho kuhloswe) Novemba 2024
Novemba 2024
Yes
Yes
izinga = 0.0 max_tokens = 16384 budget_tokens = 8192
izinga = 0.0 max_tokens = 16384 budget_tokens = 8192
DeepSeek
deepseek-chat (DeepSeek-V3)
unknown
No
temperature = 0.0
I-DeepSeek
I-DeepSeek
epseek-chat (DeepSeek-V3)
I-deepseek-chat (DeepSeek-V3)
ukudluliselwa-chat(DeepSeek-V3) ukudluliselwa
izithombe nezithombe
Khona
Khona
umthamo = 0.0
izinga lokushisa = 0.0
epseek-reasoner (DeepSeek-R1)
unknown
Yes
temperature = 0.0
Iphrofayili ye-Imunify
I-deepseek-reasoner (I-DeepSeek-R1)
I-deepseek-reasoner (I-DeepSeek-R1)
I-deepseek-reasoner (I-DeepSeek-R1) ukudluliselwa
izithombe nezithombe
Yes
Yes
umthamo = 0.0
izinga lokushisa = 0.0
Google
gemini-2.0-flash-001
unknown
No
temperature = 0.0
I-Google
I-Google
izindwangu-2.0-flash-001
ukugqoka-2.0-flash-001
umzila-2.0-flash-001 ukudluliselwa
izithombe nezithombe
Khona
Khona
umthamo = 0.0
izinga lokushisa = 0.0
izindiza-2.0-pro-exp-02-05
unknown
No
temperature = 0.0
Iphrofayili ye-Imunify
izindwangu-2.0-pro-exp-02-05
ukugqoka-2.0-pro-exp-02-05
ukugqoka-2.0-pro-exp-02-05 ukudluliselwa
izithombe nezithombe
Khona
Khona
umthamo = 0.0
izinga lokushisa = 0.0
izindiza-2.5-pro-exp-03-25
unknown
Yes
temperature = 0.0
Iphrofayili ye-Imunify
izindiza-2.5-pro-exp-03-25
ukugqoka-2.5-pro-exp-03-25
mali-2.5-pro-exp-03-25 ukudluliselwa
izithombe nezithombe
Yes
Yes
umthamo = 0.0
izinga lokushisa = 0.0
xAI
grok-2-1212
Julayi 17, 2024
No
p>emali = 42
xAI
XAU
I-Grok-2-1212
I-Grok-2-1212
Grok-2-1212 Julayi 17, 2024
Julayi 17, 2024
Khona
Khona
izithombe = 42
izithombe = 42
OpenAI
o1-2024-12-17
Oct 01, 2023
Yes
semi = 42
OpenAI
OpenAI
o1-2024-12-17
o1-2024-12-17
o1-2024-12-17 Oct 01, 2023
Oct 01, 2023
Yes
Yes
izithombe = 42
izithombe = 42
o3-mini-2025-01-31
Oct 01, 2023
Yes
semi = 42
Iphrofayili ye-Imunify
o3-mini-2025-01-31
o3-mini-2025-01-31
o3-mini-2025-01-31 Oct 01, 2023
Oct 01, 2019
Yes
Yes
izithombe = 42
izithombe = 42
I-benchmark yenzelwe ukuba kuyinto enhle kakhulu futhi enhle kakhulu; Ngakho-ke, ama-parameters efana ne-"temperature" noma "seed" zisetshenziswe. Nokho, akuyona amamodeli e-tested akufanele ukukhipha ngokugcwele kwe-deterministic. Lokhu umphakeli kufanele kusetshenziswe ngesikhathi sokushesha ngezifundo zayo.
Zonke izinsuku zokugqoka kwezobuchwepheshe ezaziwayo zitholakala ngaphambi kwe-problem enhle ye-dataset ezaziwayo (Novemba 2024). Kodwa-ke, asikwazi ukuthola izinsuku zokugqoka ze-Gemini ne-DeepSeek amamodeli.
Enye amamodeli inikeza imodi ye-"Rasoning" noma "Thinking" ngokuzimela, kanti ku-Claude 3.7 Sonnet kungenziwa ngokuvumela ama-parameter. Ukusetshenziswa kwalo umsebenzi kubhalwe embhedeni. Ezinye izici ze-model (noma ama-"instruments") njenge-web search ayikwazi, ngisho nangokuxhaswa.
Iziphumo
Wonke abacwaningi babonisa izinga eliphezulu kakhulu lokuphumelela kwama-problems ezaziwayo, njenge-benchmark edlule. Ngitholile amamodeli ephezulu noma ama-modification (ikakhulukazi: Claude 3.7 Sonnet with reasoning enabled, DeepSeek R1, Gemini 2.5 Pro, O1) ukuze ukugcina isikhathi futhi amaklayenti, njengoba imiphumela iyatholakala kakhulu.
Iziphumo zihlukile kakhulu ngezinkinga ze-"invisible" ngezindlela ezimbili:
- For all models the acceptance rate is lower for "unseen" problems. This is especially notable for medium and hard problems.
- Models with "reasoning" or "thinking" mode enabled produced better results across problems of all difficulties, though exact numbers varied from model to model.
Wonke amamodeli izinga lokuphumula kunezinguquko ezingaphumelele kumemibuzo "ukumangaliswa". Lokhu kubalulekile ikakhulukazi kumemibuzo ephakathi nokumangaliswa. I-akceptance rate iyaphakeme ngezifiso ze-"invisible"Amamodeli nge "ukudlulisela" noma "ukudlulisela" imodi enikeze imiphumela engcono phezu imibuzo of zonke izinzuzo, kodwa izibalo esizayo zihlanganisa imodeli ngamodeli. Amamodeli nge-"razoning" noma "thinking" mode ifakwe imiphumela engconoIzinga okusezingeni eliphezulu lokuthumela kwebhizinisi ezaziwayo ingatholakala ngokuvamile ukuthi ama-problems kanye nemisombululo zabo zihlanganisa ku-trening kits, futhi amamodeli akuyona kuphela ukuguqulwa kwesisombululo esaziwayo eseceleni. Nokho, izinga lokuthumela abasebenzisi kwebhizinisi ezintsha ezingu-medium kanye ne-hard kuyinto engaphansi kunokuba kwebhizinisi e-"okuthandayo" set. Lezi zimo zihlanganisa, futhi akudingeki ukuthi ama-problems ezintsha "ezinzima". Ukushintshwa kwe-difficulty, njenge-ngaphambili, i-subjective kakhulu. Futhi, njenge-LLMs ngokuvamile, abasebenzisi abantu angakwazi ukunikeza izixaz
Zonke amamodeli nge-"reasoning" mode ibonisa imiphumela enhle kunazo zonke amamodeli eziyinhloko. I-most importantly, ezinye amamodeli abakwazi ukuguqulwa inani elikhulu ye-middle ne-hard problems - umphumela engatholakali kuphela eminyakeni edlule. I-o3-mini ibonisa imiphumela enhle phakathi kwezinye amodeli ze-"reasoning" - ibonise futhi engcono kunazo zonke ama-o1 ezinzima kakhulu futhi engaphakeme kakhulu. Kubalulekile ukunceda ukuthi o3-mini iboniswa ngokufanelekileyo ukuguqulwa imibuzo yokuhlanganisa imKungcono kakhulu, ezinye iziqu zehlabathi ukulawula inani elikhulu lwezinkinga ezincinane kanye nezinkinga ezincinaneO3-mini was specifically trained umthombo we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-Imininingwane ezidlulayo
Kungabikho ukulungiswa ukuthi ama-problems 'ukungabonakaliwa' asekelwe ku-training data yama-models. Ukuze ukuguqulwa lokhu, singathanda ukuvelisa ama-problems ezintsha, ezizodwa ezizenziselwa ama-benchmarks ezidlulayo - ngokuvamile, usebenzisa i-LLMs.
Ngaphezu kwalokho, isinyathelo esinye kuyinto ukusetshenziswa ngezilimi zokusetshenziswa kakhulu zokusebenza. Le ndlela kungenzeka ukuthi LLMs zihlanganisa isixazululo kunokuba "copy-pasting" ikhodi esaziwayo efanelekayo ePython.
Lezi izakhiwo zihlangene izifundo ezilandelayo, futhi ngithanda ukuthi abanye noma ngiyakwazi ukujabulela.
Imininingwane
https://github.com/whisk-on-solving-leetcode-problems">https://github.com/whisk-on-leetgptsolver I-benchmark yayo edlule imiphumela (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems
I-Raw imiphumela, ama-problem sets, ne-source code ingatholakala ku-GitHub yami: https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolverImiphumela yayo yokuqala ye-benchmark (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems https://hackernoon.com/testing-llms-on-solving-leetcode-problems https://hackernoon.com/testing-llms-on-solving-leetcode-problems
Cover image eyakhiwe ngu DALL·E.