融合第一视角轻量目标检测的配电网带电作业操作监护多模态智能体构建
CSTR:
作者:
作者单位:

1. 四川大学电气工程学院,四川 成都 610065;2. 国网成都供电公司,四川 成都 610041

作者简介:

通讯作者:

中图分类号:

基金项目:

智能电网国家科技重大专项资助 (2025ZD0806603)


Construction of a multimodal intelligent agent for live-line operation monitoring in distribution networks via first-person lightweight object detection fusion
Author:
Affiliation:

1. College of Electrical Engineering, Sichuan University, Chengdu 610065, China; 2. State Grid Chengdu Electric Power Supply Company, Chengdu 610041, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对配电网带电作业监护面临视角局限、边缘端算力不足,且缺乏专用轻量化智能方案支撑操作精准解析的问题,提出融合第一视角轻量目标检测的多模态智能体构建方法。首先,基于 YOLO11 模型引入改进多尺度注意力、风车形状卷积及跨视图交互模块,分别优化多尺度特征提取性能、低对比度特征轻量化提取性能与视图抗干扰性能,采用最小点距交并比损失函数优化边界框回归。然后,根据目标检测信息实现第一视角信息提取并生成结构化提示词。最后,结合目标检测模型与 DeepSeek-V3 构建多模态智能体,整合视觉检测结果,量化输出风险评估与操作分析。所提方法浮点运算次数为 7.3 G,其交并比阈值小于 0.5 的目标检测多类别平均精度显著优于主流单阶段模型。输出内容在状态判断准确率、结构化程度和内容可解释度上均优于主流开源多模态模型。所提方法为多场景配电网带电作业监护提供高效、可推广的方案。

    Abstract:

    To address the limitations in live-line operation monitoring for distribution networks, including restricted field of view, insufficient edge computing capability, and the lack of dedicated lightweight intelligent solutions for accurate operation analysis, a construction method for multimodal intelligent agent integrating first-person lightweight object detection is proposed. First, built on the YOLO11 model, an improved architecture incorporating multi-scale attention mechanisms, windmill-shaped convolution, and cross-view interaction modules is introduced to enhance multi-scale feature extraction, lightweight low-contrast feature representation, and view robustness. The model adopts minimum point distance intersection over union (MPDIoU) loss function to refine bounding box regression. Then, first-person view information is extracted from detection results to generate structured prompts. Finally, a multimodal intelligent agent is constructed by integrating the optimized detection model with DeepSeek-V3, enabling the fusion of visual outputs for quantitative risk assessment and operation analysis. Experimental results show that the proposed method reduces computational complexity to 7.3 G FLOPs, and achieves up to a 9.44% improvement in mean precision for multi-class detection with IoU thresholds below 0.5 compared to mainstream single-stage models. The generated outputs outperform leading open-source multimodal models in state judgment accuracy, structuring degree, and interpretability, providing an efficient and scalable solution for multimodal monitoring of live-line operations in distribution networks.

    参考文献
    相似文献
    引证文献
引用本文

刘子豪,刘友波,宫昊辰,等.融合第一视角轻量目标检测的配电网带电作业操作监护多模态智能体构建[J].电力系统保护与控制,2026,54(11):115-126.[LIU Zihao, LIU Youbo, GONG Haochen, et al. Construction of a multimodal intelligent agent for live-line operation monitoring in distribution networks via first-person lightweight object detection fusion[J]. Power System Protection and Control,2026,V54(11):115-126]

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-10
  • 最后修改日期:2026-03-23
  • 录用日期:
  • 在线发布日期: 2026-05-27
  • 出版日期:
文章二维码
关闭
关闭