• Home
  • Information
  • Editorial Board
  • Submission Guidelines
  • Template for PCMP
  • Ethics & Disclosures
Citation:Shouyuan Shi,Zhenning Pan,Member,et al.Synergistic Carbon Trading and Power Generation Decision Considering the Annual Compliance Cycle and Market Response: A Hybrid Mathematical-Deep Reinforcement Learning Optimization Approach[J].Protection and Control of Modern Power Systems,2026,V11(01):173-191[Copy]
Print       PDF       View/Add Comment      Download reader       Close
←Prev|Next→ Archive    Advanced Search
Click: 576   Download: 1479 本文二维码信息
Synergistic Carbon Trading and Power Generation Decision Considering the Annual Compliance Cycle and Market Response: A Hybrid Mathematical-Deep Reinforcement Learning Optimization Approach
Shouyuan Shi,Zhenning Pan, Member, IEEE,Junbin Chen,Tao Yu, Senior Member, IEEE
Font:+|Default|-
Abstract:
The annual compliance cycle of the carbon trading system allows generation companies (GenCos) to decouple the timing of carbon allowance purchases from their actual emissions. However, trading a large volume of allowances within a single day can significantly impact on carbon prices. Faced with uncertain future carbon and electricity prices, GenCos must address a challenging multistage stochastic optimization problem to coordinate their carbon trading strategies with daily power generation decisions. In this paper, a two-layered hybrid mathematical-deep reinforcement learning (DRL) optimization framework is proposed. The upper DRL layer tackles the stochastic, year-long carbon trading and allowance usage optimization problem, aiming for long-term optimality and providing guidance for short-term decisions in the lower layer. The lower mathematical optimization layer addresses the deterministic daily power generation schedule problem while en-forcing strict technical constraints. To accelerate learning of the annual compliance cycle, a decision timeline transfer learning method is proposed, enabling the DRL agent to progressively refine its policy through sequentially training on monthly, weekly and daily decision environments. Case studies demonstrate that, with these methods, a GenCo can reduce emission costs and increase profits by effectively leveraging carbon price fluctuations within the compliance cycle.
Key words:  Carbon trading market, deep reinforcement learning, electricity market, generation company, market response.
DOI:10.23919/PCMP.2024.000327
Fund:This work is jointly supported by the Natural Science Foundation of China-Smart Grid Joint Fund of State Grid Corporation of China (No. U2066212); the National Natural Science Foundation of China (No. 52207105); and the Key Science and Technology Projects of China Southern Power Grid Corporation (No. 066600KK52222023).
Protection and Control of Modern Power Systems
Add: No. 17 Shangde Road, Xuchang 461000, Henan Province, P. R. China
E-mail: pcmp@vip.126.com     Tel: 0374-3212254/2234
  copyright Power Kingdom 2022.豫ICP备17035427号-1