Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - On the performance of large language models on introductory programming assignments
AU - Raihan, N.
AU - Goswami, D.
AU - Puspo, S.S.C.
AU - Siddiq, M.L.
AU - Newman, C.
AU - Ranasinghe, T.
AU - Santos, J.C.S.
AU - Zampieri, M.
PY - 2025/8/16
Y1 - 2025/8/16
N2 - Recent advances in artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) have led to the development of a new generation of Large Language Models (LLMs) trained on massive amounts of data. Commercial applications (e.g., ChatGPT) have made this available to the general public, enabling the use of LLMs to produce high-quality texts for academic and professional purposes. Educational institutions are increasingly aware of students’ use of AI-generated content and are researching its impact and potential misuse. Computer Science (CS) and related fields are particularly affected, as LLMs can also generate programming code in various languages. To understand the potential impact of publicly available LLMs in CS education, we extend our previously introduced CSEPrompts (Raihan et al. 2024), a framework comprising hundreds of programming exercise prompts and multiple-choice questions from introductory CS and programming courses. We provide experimental results on CSEPrompts, evaluating the performance of several LLMs in generating Python code and answering basic computer science and programming questions, offering insights into the implications of this technology for CS education.
AB - Recent advances in artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) have led to the development of a new generation of Large Language Models (LLMs) trained on massive amounts of data. Commercial applications (e.g., ChatGPT) have made this available to the general public, enabling the use of LLMs to produce high-quality texts for academic and professional purposes. Educational institutions are increasingly aware of students’ use of AI-generated content and are researching its impact and potential misuse. Computer Science (CS) and related fields are particularly affected, as LLMs can also generate programming code in various languages. To understand the potential impact of publicly available LLMs in CS education, we extend our previously introduced CSEPrompts (Raihan et al. 2024), a framework comprising hundreds of programming exercise prompts and multiple-choice questions from introductory CS and programming courses. We provide experimental results on CSEPrompts, evaluating the performance of several LLMs in generating Python code and answering basic computer science and programming questions, offering insights into the implications of this technology for CS education.
U2 - 10.1007/s10844-025-00968-y
DO - 10.1007/s10844-025-00968-y
M3 - Journal article
JO - Journal of Intelligent Information Systems
JF - Journal of Intelligent Information Systems
SN - 1573-7675
ER -