Reinforcement Learning–based Operational Optimization of a Hybrid Renewable System with Energy Storage

Khalid, Monzer

Home

KFUPM ePrints

In this section

Reinforcement Learning–based Operational Optimization of a Hybrid Renewable System with Energy Storage

Khalid, Monzer (2026) Reinforcement Learning–based Operational Optimization of a Hybrid Renewable System with Energy Storage. Masters thesis, King Fahd University of Petroleum and Minerals.

PDF
KFUPM_Thesis_Monzer_Khalid.pdf
Restricted to Repository staff only until 21 May 2027.
Download (5MB)

Arabic Abstract

تُسهم محطات الطاقة المتجددة الهجينة التي تجمع بين الخلايا الكهروضوئية )PV( وطاقة الرياح والطاقة الشمسية المركزة )CSP( مع التخزين الحراري للطاقة )TES( وأنظمة تخزين الطاقة بالبطاريات )BESS( في تحسين الاستفادة من الطاقة المتجددة ورفع موثوقية الإمداد، إلا أن تشغيلها يظل تحديًا بسبب الترابط الزمني القوي بين قرارات التوزيع وقرارات التخزين. تتناول هذه الرسالة مشكلة التوزيع التشغيلي لنظام هجين يجمع بين الخلايا الكهروضوئية والرياح والطاقة الشمسية المركزة مع التخزين الحراري وتخزين البطاريات باستخدام إطار قائم على التعلم المعزز. وقد تمت صياغة النظام على هيئة عملية قرار ماركوف متسقة مع التنفيذ البرمجي، حيث يلاحظ المتحكم ظروف الطقس والطلب وحالات التخزين والخصائص الزمنية والإجراءات السابقة، ثم يحدد إجراءات شحن وحدات التخزين والتحكم في التفريغ. تم أولًا تقييم ثلاث بنيات لأولوية التوزيع معتمدة على خوارزمية ،PPO وهي أولوية المتجددات أولًا، والأولوية الديناميكية، وأولوية الطاقة الشمسية المركزة أولًا. وأظهرت النتائج أن بنية أولوية المتجددات أولًا تحقق أفضل توازن عام بين العائد والموثوقية وتقليل الفاقد، مع تبسيط مشكلة التحكم في الوقت نفسه. وبعد اختيار هذه البنية، أُجري تحليل حساسية لتحسين إعدادات خوارزمية .PPO وحقق المتحكم المضبوط أفضل أداء سنوي كلي، ثم جرت مقارنته بمتحكم قاعدي حتمي وبخوارزميتين منافستين (TD3). Gradient Policy Deterministic Deep Delayed Twinو (SAC) Actor--Critic Soft من خوارزميات التعلم المعزز العميق، وهما وأظهرت النتائج أن PPO حققت أدنى احتمال لفقد الإمداد، وأقل نسبة فاقد للطاقة، وأفضل عائد مُشكَّل بالتكلفة، وأقصر زمن تدريب بين جميع المتحكمات المختبرة. وتُبين هذه النتائج أن متحكم PPO مضبوطًا مع بنية أولوية المتجددات أولًا يوفر إستراتيجية فعالة وقابلة للتفسير وذات كفاءة حسابية للتشغيل الأمثل لأنظمة الطاقة الهجينة التي تجمع بين الخلايا الكهروضوئية والرياح والطاقة الشمسية المركزة مع التخزين الحراري وتخزين البطاريات.

English Abstract

Hybrid renewable power plants that combine photovoltaic (PV), wind, and concentrated solar power (CSP) with thermal energy storage (TES) and battery energy storage systems (BESS) can improve renewable utilization and supply reliability, but their operation remains challenging because dispatch and storage decisions are strongly coupled over time. This thesis addresses the operational dispatch problem of a hybrid PV–wind–CSP system with TES and BESS using a reinforcement-learning-based framework. The system is formulated as a code-consistent Markov decision process in which the controller observes weather conditions, demand, storage states, time-related features, and previous actions, and then determines storage-charging and discharge-control actions. Proximal Policy Optimization (PPO), a policy-gradient reinforcement learning algorithm for continuous-control problems, is adopted as the main learning method. Three PPO-based dispatch-priority structures are first evaluated, namely fixed renewable-first, in which renewable electric generation is prioritized before CSP and storage resources, dynamic priority, in which the controller learns the dispatch order adaptively, and fixed CSP-first, in which CSP generation is prioritized ahead of renewable-electric generation. The results show that the fixed renewable-first structure provides the best overall trade-off among total operating cost, loss of power supply probability (LPSP), and curtailment while also simplifying the control problem. After selecting this structure, a sensitivity analysis is carried out to improve the PPO configuration. The tuned PPO controller is then benchmarked against a deterministic rule-based controller that follows a predefined dispatch strategy with fixed charging and discharge actions, and against two competing deep reinforcement learning algorithms, Soft Actor–Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3). The results show that PPO reduces total operating cost, LPSP, and curtailment relative to the rule-based benchmark and also outperforms SAC and TD3 on the same adopted system while requiring the shortest training time among the tested controllers. These findings show that a tuned renewable-first PPO controller provides an effective, interpretable, and computationally efficient strategy for the operational optimization of hybrid PV–wind–CSP systems with thermal and battery storage.

Item Type:	Thesis (Masters)
Subjects:	Computer
Department:	College of Computing and Mathematics > lndustrial and Systems Engineering
Thesis Advisor:	Ahmad Al Hanbali,
Thesis Committee Members:	Mohammad Aldurgam, Ahmed Ghaithan,
Depositing User:	MONZER KHALID
Date Deposited:	02 Jun 2026 06:28
Last Modified:	30 Jun 2026 09:21
URI:	https://eprints.kfupm.edu.sa/id/eprint/144438