Comparative performance of large language models in solving mathematical and computer science problems

Вантажиться...
Ескіз

Дата

Назва журналу

Номер ISSN

Назва тому

Видавець

II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem

Анотація

Abstract. The rapid adoption of Large Language Models (LLMs) in technical domains necessitates a rigorous, comparative assessment of their core computational abilities. Despite architectural advances, a performance gap persists in tasks requiring true mathematical reasoning and error-free algorithmic generation. This study addresses this gap by quantifying the proficiency differential across leading LLMs (ChatGPT, Claude, Gemini, Mistral) in complex Computer Science and Mathematics problems to identify specific systemic limitations. This research aims to compare the performance of various large language models (LLMs) — such as ChatGPT, Claude, Gemini, and Mistral — in solving mathematical and computer science problems. Findings indicate that LLMs demonstrate significant differences in logical reasoning, algorithmic thinking, and programming task performance. According to Moon [1], more advanced models show improved accuracy in mathematical derivations but still exhibit frequent logical and syntactic errors. Zhao [2] emphasizes that despite architectural advancements, consistent mathematical reasoning remains a challenge for current LLMs. Noveski [3] highlights that code generation success largely depends on prompt structure and task complexity. In educational contexts, the study by Emerging Investigators [4] shows that LLMs can effectively support mathematics learning but still require improvements in explanation depth and accuracy. Overall, large language models are promising tools for enhancing algorithmic and problem-solving skills, though their performance remains contextdependent and cannot yet substitute human expertise. Conclusion. While LLMs show a promising ability to enhance algorithmic and problem-solving skills and serve as effective, though imperfect, educational support tools, their performance remains fundamentally context-dependent. The persistent occurrence of logical and syntactic errors in complex mathematical derivations and the variability of code generation success confirm that current LLM architectures cannot yet reliably substitute the human capacity for domain-specific expertise and rigorous logical verification.

Опис

Teljes kiadvány: https://kme.org.ua/uk/publications/rol-bezpeki-v-transkordonnomu-ta-mizhnarodnomu-spivrobitnictvi/

Бібліографічний опис

In Csernicskó István, Maruszinec Marianna, Molnár D. Erzsébet, Mulesza Okszána és Melehánics Anna (szerk.): A biztonság szerepe a határon átnyúló és nemzetközi együttműködésben. Nemzetközi tudományos és szakmai konferencia Beregszász, 2025. október 8–9. Absztraktkötet. Beregszász, II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem, 2025. 96. p.

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States