引用本文: | 邱 建,朱煜昆,张建新,等.基于安全强化学习的电网稳控策略智能生成方法[J].电力系统保护与控制,2024,52(10):147-155.[点击复制] |
QIU Jian,ZHU Yukun,ZHANG Jianxin,et al.Intelligent generation method of power system stability control strategy based on safe reinforcement learning[J].Power System Protection and Control,2024,52(10):147-155[点击复制] |
|
本文已被:浏览 2331次 下载 978次 |
 码上扫一扫! |
基于安全强化学习的电网稳控策略智能生成方法 |
邱建1,朱煜昆2,3,张建新1,朱益华2,3,徐光虎1,涂亮2,3 |
|
(1.中国南方电网有限责任公司,广东 广州 510663;2.直流输电技术全国重点实验室(南方电网科学研究院有限
责任公司),广东 广州 510663;3.广东省新能源电力系统智能运行与控制企业重点实验室,广东 广州 510663) |
|
摘要: |
新型电力系统的“双高”趋势改变了电力系统经典稳定特性,导致稳定机理更复杂,系统稳定模式更多样,因此基于典型运行方式的在线稳定控制策略面临挑战。为解决新型电力系统的功角稳定问题,提出了基于安全强化学习的稳控策略智能生成方法。首先,建立了电力系统稳控问题的含约束马尔可夫模型,归纳并提出了紧急控制切机动作涉及的安全约束。其次,为了提高对于电网暂态响应的时空特征提取能力,构建了基于图卷积层和长短期记忆单元的特征感知网络。然后,为了提高稳控策略智能体的训练效率,提出了基于内嵌领域知识约束的近端策略优化算法稳控策略训练框架。最后,在IEEE 39节点系统和某实际电网中进行测试验证。结果表明,所提方法能够根据系统运行状态和故障响应自适应生成切机稳控策略,其决策效果和效率均优于现有的稳控策略。 |
关键词: 稳控策略 安全强化学习 时空特征 领域知识 |
DOI:10.19783/j.cnki.pspc.231360 |
投稿时间:2023-10-20修订日期:2024-02-15 |
基金项目:南方电网公司重点科技项目资助(000000KK 52210139) |
|
Intelligent generation method of power system stability control strategy based on safe reinforcement learning |
QIU Jian1,ZHU Yukun2,3,ZHANG Jianxin1,ZHU Yihua2,3,XU Guanghu1,TU Liang2,3 |
(1. China Southern Power Grid Company Limited, Guangzhou 510663, China; 2. State Key Laboratory of HVDC, Electric
Power Research Institute, CSG, Guangzhou 510663, China; 3. Guangdong Provincial Key Laboratory of
Intelligent Operation and Control for New Energy Power System, Guangzhou 510663, China) |
Abstract: |
The trend of a “higher proportion of renewable energy and power electronics” in the new power system has changed the classical stability characteristics of the system. The stability mechanism is more complex, and the system stability modes are more diverse. Online stability control strategies based on typical operating modes face a challenge. Considering the rotor angle stability problem of the new power system, an intelligent generation stability control strategy based on safe reinforcement learning is proposed. First, a constrained Markov model for power system stability control problems is established, and the safety constraints involved in rotor angle stability control are summarized and proposed. Secondly, to improve the ability to extract spatial and temporal features of the power grid’s transient response, a feature perception network based on graph convolutional layers and long short-term memory units is constructed. Then, to improve the training efficiency of the stability control agent, a training framework of stability control strategies using proximal policy optimization algorithm based on embedded domain knowledge constraints is proposed. Finally, a case study is performed on the IEEE 39-bus system and a practical power grid. The results show that the proposed method can adaptively generate unit tripping strategies based on the system operating state and fault response, and its decision-making effectiveness and efficiency are superior to existing stability control strategies. |
Key words: stability control strategy safety reinforcement learning temporal and spatial characteristics domain knowledge |