引用本文: | 孟建良,刘德超.一种基于Spark和聚类分析的辨识电力系统不良数据新方法[J].电力系统保护与控制,2016,44(3):85-91.[点击复制] |
MENG Jianliang,LIU Dechao.A new method for identifying bad data of power system based on Spark and clustering analysis[J].Power System Protection and Control,2016,44(3):85-91[点击复制] |
|
摘要: |
随着电力系统智能化建设的不断深入和推进,电力系统数据呈现海量化、高维化的趋势。针对电力系统中的不良数据将导致电力系统状态估计结果的准确性降低,而传统聚类算法处理海量高维数据时单机计算资源不足,近年来较流行的MapReduce框架不能有效处理频繁迭代计算等问题,提出一种基于Spark的并行K-means算法辨识不良数据的新方法。以某一节点电力负荷数据为研究对象,运用基于Spark的并行K-means聚类算法提取出日负荷特征曲线,分别对输电网状态估计中的不良数据进行检测和辨识。选用EUNITE提供的真实电力负荷数据进行实验,结果表明此方法能有效提高状态估计结果的准确性,与基于MapReduce框架的方法相比,具有更好的加速比、扩展性,能更好地处理电力系统的海量数据。 |
关键词: Spark 聚类 K-means 电力系统 不良数据 负荷曲线分类 |
DOI:10.7667/PSPC150548 |
投稿时间:2015-04-05修订日期:2015-07-29 |
基金项目: |
|
A new method for identifying bad data of power system based on Spark and clustering analysis |
MENG Jianliang,LIU Dechao |
(School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China) |
Abstract: |
With the development of intelligent power system construction, power data shows a massive and multi dimensions trends. The bad data in power system reduces the accuracy of the estimation results in the state of the power system, computational resources of the traditional clustering algorithms dealing with massive high dimensional data with single machine are insufficient, and the MapReduce, more popular in recent years, cannot effectively deal with frequent iteration calculation problem. According to the above, this paper puts forward a new method of identifying bad data with parallel K-means algorithm based on Spark. To a certain node load data as the research object, the parallel K-means clustering algorithm based on Spark is used to extract daily load characteristic curve, to detect and identify bad data in state estimation of power transmission network respectively. Experiments are conducted with the data of the real load provided by EUNITE, the results show that this method can effectively improve the accuracy of state estimation, and compared with the method based on the MapReduce, it has better speed-up ratio, scalability, and can better process massive data in power system. |
Key words: Spark clustering K-means power system bad data load curve classification |