引用本文: | 梅玉杰,李 勇,周王峰,等.基于机器学习的配电网异常缺失数据动态清洗方法[J].电力系统保护与控制,2023,51(7):158-169.[点击复制] |
MEI Yujie,LI Yong,ZHOU Wangfeng,et al.Dynamic data cleaning method of abnormal and missing data in a distribution networkbased on machine learning[J].Power System Protection and Control,2023,51(7):158-169[点击复制] |
|
摘要: |
针对传统配电网数据清洗过程中异常数据判断阈值需要人为设定、缺失数据填补效率不佳的局限性,提出基于机器学习的配电网异常缺失数据一体化动态清洗方法。首先,基于局部异常因子检测算法和高斯混合模型,提出一种异常数据动态检测改进算法,实现配电网异常数据阈值的准确自动选择。其次,基于随机森林算法与最小二乘回归法,提出一种配电网缺失数据动态填补算法。根据缺失数据时间长度自适应优化填补算法,在保证数据填补精度的同时降低计算时间。在此基础上,通过异常数据检测和缺失数据填补共同构建一体化动态清洗架构。采用湖南某地区配电网数据进行实例验证,结果表明所提方法可实现异常辨识阈值准确自动选择,有效检测配电网异常数据,并且实现缺失数据填补精度与速度的平衡,具有较好的工程应用价值。 |
关键词: 配电网 数据清洗 异常数据辨识 缺失数据填补 高斯混合模型 随机森林 |
DOI:10.19783/j.cnki.pspc.221000 |
投稿时间:2022-06-29修订日期:2022-09-14 |
基金项目:国家自然科学基金联合基金重点支持项目资助(U22B200134);国家重点研发计划政府间国际科技创新合作重点项目资助(2022YFE0129300);国网湖南省电力有限公司科技项目资助(5216A521001F) |
|
Dynamic data cleaning method of abnormal and missing data in a distribution networkbased on machine learning |
MEI Yujie1,LI Yong1,ZHOU Wangfeng2,GUO Yixiu1,DENG Wei3,QIAO Xuebo1 |
(1. School of Electrical Engineering and Information, Hunan University, Changsha 410082, China;
2. State Grid Wenzhou Power Supply Co., Ltd., Wenzhou 325000, China; 3. State Grid Human
Electric Power Co., Ltd. Research Institute, Changsha 410007, China) |
Abstract: |
There is a limitation of manual setting of an abnormal data judgment threshold and there will be inefficient filling of missing data in the traditional process of data cleaning in a distribution network. This paper proposes an integrated dynamic cleaning method for distribution network abnormal and missing data based on machine learning. First, based on a local outlier factor and Gaussian mixture model, an improved dynamic identification algorithm is proposed to realize the automatic selection of threshold of abnormal data. Second, based on the random forest algorithm and least squares regression method, a dynamic filling algorithm for missing data is proposed. Depending on the length of missing data, it adaptively optimizes the filling algorithm to ensure filling accuracy and reduce running time. An integrated dynamic cleaning architecture is built through abnormal data identification and missing data interpolation. The data of the distribution network in a certain area of Hunan are used for example verification. The results show that the proposed method can realize accurate and automatic abnormal data detection and achieve a balance between the filling accuracy and speed of missing data in a distribution network. This has good engineering application value. |
Key words: distribution network data cleaning abnormal data identification missing data interpolation Gaussian mixture model random forest |