融合先验知识和多阶段QMIX强化学习的综合能源系统优化调度

楼 劲; 汪梦雨; 郑凌蔚

引用本文:	楼劲,汪梦雨,郑凌蔚.融合先验知识和多阶段QMIX强化学习的综合能源系统优化调度[J].电力系统保护与控制,2026,54(07):13-23.
	LOU Jing,WANG Mengyu,ZHENG Lingwei.Optimal scheduling of integrated energy systems based on prior knowledge and multi-stage QMIX reinforcement learning[J].Power System Protection and Control,2026,54(07):13-23

【打印本页】【下载PDF全文】【查看/发表评论】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 334次下载 1074次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
融合先验知识和多阶段QMIX强化学习的综合能源系统优化调度
楼劲¹，汪梦雨¹，郑凌蔚^1，2
1.杭州电子科技大学自动化学院，浙江杭州 310000；2.全省分布式新能源并网与消纳技术研究重点实验室，浙江杭州 310000

摘要:

综合能源系统(integrated energy system, IES)的多能耦合特性与拓扑结构复杂化趋势，使其优化调度成为平衡经济性与安全性的关键挑战。针对传统多智能体强化学习维度灾难引发的收敛困难及探索机制缺陷导致的局部最优问题，提出了一种先验知识引导的多阶段QMIX架构实时优化调度方法。首先，将IES实时优化调度描述为分布式部分可观测马尔可夫决策过程，构建基于联合动作价值函数更新策略的QMIX框架。然后，按机组能源耦合关联度集群划分，设计多阶段QMIX训练策略以缓解维度灾难。最后，引入融合先验知识的动作探索增强机制引导收敛轨迹。在多种负荷场景(40天样本日)中进行了调度仿真。结果表明，所提方法在收敛性能上优势显著，且有效降低了系统运行成本。

关键词: 综合能源系统实时优化调度多智能体强化学习多阶段QMIX 先验知识引导

DOI：10.19783/j.cnki.pspc.250990

分类号:

基金项目:浙江省自然科学基金项目资助(LY24F030010)

Optimal scheduling of integrated energy systems based on prior knowledge and multi-stage QMIX reinforcement learning

LOU Jing¹, WANG Mengyu¹, ZHENG Lingwei^1,2

1. School of Automation, Hangzhou Dianzi University, Hangzhou 310000, China; 2. Zhejiang Key Laboratory of Distributed New Energy Grid Connection and Consumption Technology Research, Hangzhou 310000, China

Abstract:

The multi-energy coupling characteristics and increasingly complex topology of integrated energy systems (IES) make optimal scheduling a pivotal challenge in balancing economy efficiency and operational security. To address the issues of convergence difficulty caused by the curse of dimensionality in traditional multi-agent reinforcement learning, as well as local optima resulting from insufficient exploration mechanisms, a real-time optimal scheduling method based on a prior knowledge-guided multi-stage QMIX architecture is proposed. First, the IES real-time optimal scheduling is formulated as a distributed partially observable Markov decision process, and a QMIX framework based on joint action value function updates is constructed. Then, according to the coupling relationships among energy units, a clustering-based multi-stage QMIX training strategy is designed to alleviate the curse of dimensionality. Finally, an enhanced action exploration mechanism incorporating prior knowledge is developed to guide the convergence trajectory. Scheduling simulations are conducted under multiple load scenarios (40 sample days). The results show that the proposed method exhibits significant advantages in convergence performance and effectively reduces the overall system operation costs.

Key words: integrated energy system real-time optimal scheduling multi-agent reinforcement learning multi-stage QMIX prior knowledge guidance

X关闭