Comparative performance of large language models in solving mathematical and computer science problems
Вантажиться...
Дата
Назва журналу
Номер ISSN
Назва тому
Видавець
II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem
Анотація
Abstract. The rapid adoption of Large Language Models (LLMs) in technical domains necessitates a
rigorous, comparative assessment of their core computational abilities. Despite architectural
advances, a performance gap persists in tasks requiring true mathematical reasoning and error-free
algorithmic generation. This study addresses this gap by quantifying the proficiency differential
across leading LLMs (ChatGPT, Claude, Gemini, Mistral) in complex Computer Science and
Mathematics problems to identify specific systemic limitations.
This research aims to compare the performance of various large language models (LLMs) — such
as ChatGPT, Claude, Gemini, and Mistral — in solving mathematical and computer science problems.
Findings indicate that LLMs demonstrate significant differences in logical reasoning, algorithmic
thinking, and programming task performance. According to Moon [1], more advanced models show
improved accuracy in mathematical derivations but still exhibit frequent logical and syntactic errors.
Zhao [2] emphasizes that despite architectural advancements, consistent mathematical reasoning
remains a challenge for current LLMs. Noveski [3] highlights that code generation success largely
depends on prompt structure and task complexity. In educational contexts, the study by Emerging
Investigators [4] shows that LLMs can effectively support mathematics learning but still require
improvements in explanation depth and accuracy. Overall, large language models are promising tools
for enhancing algorithmic and problem-solving skills, though their performance remains contextdependent and cannot yet substitute human expertise.
Conclusion. While LLMs show a promising ability to enhance algorithmic and problem-solving
skills and serve as effective, though imperfect, educational support tools, their performance remains
fundamentally context-dependent. The persistent occurrence of logical and syntactic errors in
complex mathematical derivations and the variability of code generation success confirm that current
LLM architectures cannot yet reliably substitute the human capacity for domain-specific expertise
and rigorous logical verification.
Опис
Teljes kiadvány: https://kme.org.ua/uk/publications/rol-bezpeki-v-transkordonnomu-ta-mizhnarodnomu-spivrobitnictvi/
Бібліографічний опис
In Csernicskó István, Maruszinec Marianna, Molnár D. Erzsébet, Mulesza Okszána és Melehánics Anna (szerk.): A biztonság szerepe a határon átnyúló és nemzetközi együttműködésben. Nemzetközi tudományos és szakmai konferencia Beregszász, 2025. október 8–9. Absztraktkötet. Beregszász, II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem, 2025. 96. p.
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States
