Comparative performance of large language models in solving mathematical and computer science problems

Varga Róbert; Kuchinka Katalin; Kucsinka Katalin; Кучінка Каталін

Comparative performance of large language models in solving mathematical and computer science problems

Файли

Comparative_performance_large_language_2025.pdf (9.76 MB)

Дата

2025

Автори

Видавець

II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem

Анотація

Abstract. The rapid adoption of Large Language Models (LLMs) in technical domains necessitates a rigorous, comparative assessment of their core computational abilities. Despite architectural advances, a performance gap persists in tasks requiring true mathematical reasoning and error-free algorithmic generation. This study addresses this gap by quantifying the proficiency differential across leading LLMs (ChatGPT, Claude, Gemini, Mistral) in complex Computer Science and Mathematics problems to identify specific systemic limitations. This research aims to compare the performance of various large language models (LLMs) — such as ChatGPT, Claude, Gemini, and Mistral — in solving mathematical and computer science problems. Findings indicate that LLMs demonstrate significant differences in logical reasoning, algorithmic thinking, and programming task performance. According to Moon [1], more advanced models show improved accuracy in mathematical derivations but still exhibit frequent logical and syntactic errors. Zhao [2] emphasizes that despite architectural advancements, consistent mathematical reasoning remains a challenge for current LLMs. Noveski [3] highlights that code generation success largely depends on prompt structure and task complexity. In educational contexts, the study by Emerging Investigators [4] shows that LLMs can effectively support mathematics learning but still require improvements in explanation depth and accuracy. Overall, large language models are promising tools for enhancing algorithmic and problem-solving skills, though their performance remains contextdependent and cannot yet substitute human expertise. Conclusion. While LLMs show a promising ability to enhance algorithmic and problem-solving skills and serve as effective, though imperfect, educational support tools, their performance remains fundamentally context-dependent. The persistent occurrence of logical and syntactic errors in complex mathematical derivations and the variability of code generation success confirm that current LLM architectures cannot yet reliably substitute the human capacity for domain-specific expertise and rigorous logical verification.

Опис

Teljes kiadvány: https://kme.org.ua/uk/publications/rol-bezpeki-v-transkordonnomu-ta-mizhnarodnomu-spivrobitnictvi/

Ключові слова

Large Language Models (LLMs), mathematics, code generation, computer science problems

Бібліографічний опис

In Csernicskó István, Maruszinec Marianna, Molnár D. Erzsébet, Mulesza Okszána és Melehánics Anna (szerk.): A biztonság szerepe a határon átnyúló és nemzetközi együttműködésben. Nemzetközi tudományos és szakmai konferencia Beregszász, 2025. október 8–9. Absztraktkötet. Beregszász, II. Rákóczi Ferenc Kárpátaljai Magyar Egyetem, 2025. 96. p.

URI

https://dspace.kme.org.ua/handle/123456789/5816

Зібрання

A biztonság szerepe a határon átnyúló és nemzetközi együttműködésben
Kucsinka Katalin

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Повна інформація про документ

Comparative performance of large language models in solving mathematical and computer science problems

Файли

Дата

Автори

Назва журналу

Номер ISSN

Назва тому

Видавець

Анотація

Опис

Ключові слова

Бібліографічний опис

URI

Зібрання

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license