Comparative Analysis of Large Language Models for Automated Commit Message Generation

Trigui, Mohamed Mehdi

Home

KFUPM ePrints

In this section

Comparative Analysis of Large Language Models for Automated Commit Message Generation

Trigui, Mohamed Mehdi (2026) Comparative Analysis of Large Language Models for Automated Commit Message Generation. Masters thesis, King Fahd University of Petroleum and Minerals.

PDF
MohamedMehdiTrigui202318370_Thesis.pdf
Restricted to Repository staff only until 7 June 2027.
Download (1MB)

Arabic Abstract

تعد رسائل الالتزام عنصرًا مهمًا في أنظمة التحكم في الإصدارات مثل Git، حيث تساعد في توثيق التغييرات البرمجية وتحسين قابلية صيانة المشاريع البرمجية وفهم تطورها مع مرور الوقت. ومع ذلك، غالبًا ما يكتب المطورون رسائل التزام قصيرة أو غير واضحة بسبب ضيق الوقت أو غياب الإرشادات المناسبة، مما يؤدي إلى صعوبة فهم التغييرات البرمجية وإدارة المشاريع الكبيرة. مع التطور السريع في مجال الذكاء الاصطناعي ونماذج اللغة الكبيرة (LLMs) مثل GPT-4o-mini DeepSeek-Chat وQwen2.5، أصبح من الممكن أتمتة بعض مهام هندسة البرمجيات، بما في ذلك توليد رسائل الالتزام تلقائيًا من التغييرات البرمجية. تهدف هذه الدراسة إلى إجراء تحليل مقارن لتقييم أداء هذه النماذج في توليد رسائل التزام دقيقة ومفهومة. اعتمدت الدراسة على مجموعة بيانات CommitBench التي تحتوي على عدد كبير من عمليات الالتزام من مستودعات برمجية مفتوحة المصدر. تم تقييم أداء النماذج باستخدام مقاييس كمية مثل BLEU وROUGE وMETEOR، بالإضافة إلى تقييم بشري يعتمد على وضوح الرسائل ودقتها. أظهرت النتائج أن النماذج المدربة خصيصًا على بيانات رسائل الالتزام تحقق أفضل أداء، بينما يمكن لتقنيات الاسترجاع المعزز بالتوليد (RAG) تحسين أداء النماذج العامة من خلال توفير سياق إضافي أثناء التوليد. تسهم هذه النتائج في تطوير أدوات ذكية تساعد المطورين على كتابة رسائل التزام أكثر وضوحًا واتساقًا في المشاريع البرمجية الحديثة.

English Abstract

Commit messages are crucial for making software projects easier to maintain, understand, and update when using tools like Git. Still, there are developers who write short and inconsistent commit messages. This is usually due to time constraints. rules clearly defined in their minds, or perhaps they don’t recognize the importance of such guidelines. Consequently, it becomes increasingly difficult for others to understand the project, or to brief newcomers. With the rise of advanced Large Language Models (LLMs) like GPT-4o-mini, DeepSeek-Chat, and Qwen2.5, there is a real potential to automate some of these time consuming software engineering tasks. Such AI models can be used to review code changes and summarize the results in an understandable way. They can create commit messages that are similar in quality to what a human developer might create. However, there has been little research into how well these models perform in the generation of commit messages, especially in diverse and realistic situations. Our research will address this problem. In our research, we will carry out an extensive evaluation of some LLMs with the help of the CommitBench dataset, which is extensive and varied, created particularly for the evaluation of commit message generation. We will evaluate the performance of the models using quantitative measures such as BLEU and ROUGE, as well as subjective evaluation based on the judgment of actual people on the clarity, usefulness, and relevance of the commit messages. Finally, we discuss a conceptual integration workflow illustrating how such models could be incorporated into real development environments, providing a foundation for future work on AI-assisted software engineering tools.

Item Type:	Thesis (Masters)
Subjects:	Computer
Department:	College of Computing and Mathematics > Information and Computer Science
Thesis Advisor:	Wasfi Al-khatib,
Thesis Committee Members:	Jameleddine Hassine, Omar Hammad,
Depositing User:	MOHAMED MEHDI TRIGUI
Date Deposited:	08 Jun 2026 05:59
Last Modified:	02 Jul 2026 10:50
URI:	https://eprints.kfupm.edu.sa/id/eprint/144520