Enhancing YOLOv5s with Attention Mechanisms for Object Detection in Complex Backgrounds Environment

Ali Impron, Dina Lestari, Linda Sutriani, Syadza Anggraini, Randi Rizal

Abstract

Enhancing performance for object detection in complex environments is essential for real-world applications that represent complexities, such as stacking objects in the same location or environment. Models for detecting objects developed to this day still have difficulties in detecting objects with environments that have complex backgrounds. The reason is that the model often experiences a decrease in accuracy when the object to be detected is occlusion by other objects and is small in size. Therefore, in this study, a model improvement method was carried out in detecting objects in a complex environment. The algorithm used in this study is YOLOv5s. Optimization is carried out by adding a CBAM (Convolutional Block Attention Module) attention mechanism layer which is integrated with the C3 layer (C3CBAM) in the backbone of the YOLOv5s model architecture. In addition, a P2 feature map is also added to the architecture head. The optimization results carried out were quite satisfactory, namely there was an increase in the precision value by 1.6 %, at mAP@0.5 an increase of 1.4 %, and also mAP@50-95 increased by 0.1%. This proves that the enhancement method applied to YOLOv5s in this study can improve the performance of the model. However, with the addition of the attention mechanism layer, it turns out that it can increase the computational load. Therefore, for future research, a method can be applied to reduce computing load, one of the methods is knowledge distillation.

References

H. Da, “Complex Environment Road Object Detection Algorithm Based on Improved YOLOv5s,” in 2024 6th International Conference on Data-driven Optimization of Complex Systems (DOCS), 2024, pp. 625–630. doi: 10.1109/DOCS63458.2024.10704511.

J. Zhong, Q. Cheng, X. Hu, and Z. Liu, “YOLO Adaptive Developments in Complex Natural Environments for Tiny Object Detection,” Electronics (Switzerland), vol. 13, no. 13, Jul. 2024, doi: 10.3390/electronics13132525.

J. Ruan, H. Cui, Y. Huang, T. Li, C. Wu, and K. Zhang, “A review of occluded objects detection in real complex scenarios for autonomous driving,” Green Energy and Intelligent Transportation, vol. 2, no. 3, p. 100092, 2023, doi: https://doi.org/10.1016/j.geits.2023.100092.

E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A Survey of Autonomous Driving: Common Practices and Emerging Technologies,” IEEE Access, vol. 8, pp. 58443–58469, 2020, doi: 10.1109/ACCESS.2020.2983149.

C. Baoyuan, L. Yitong, and S. Kun, “Research on Object Detection Method Based on FF-YOLO for Complex Scenes,” IEEE Access, vol. 9, pp. 127950–127960, 2021, doi: 10.1109/ACCESS.2021.3108398.

D. Peng, W. Ding, and T. Zhen, “A novel low light object detection method based on the YOLOv5 fusion feature enhancement,” Sci Rep, vol. 14, no. 1, p. 4486, 2024, doi: 10.1038/s41598-024-54428-8.

W.-Y. Hsu and W.-Y. Lin, “Adaptive Fusion of Multi-Scale YOLO for Pedestrian Detection,” IEEE Access, vol. 9, pp. 110063–110073, 2021, doi: 10.1109/ACCESS.2021.3102600.

X. Ren, W. Zhang, M. Wu, C. Li, and X. Wang, “Meta-YOLO: Meta-Learning for Few-Shot Traffic Sign Detection via Decoupling Dependencies,” Applied Sciences, vol. 12, no. 11, 2022, doi: 10.3390/app12115543.

F. Cao et al., “An Efficient Object Detection Algorithm Based on Improved YOLOv5 for High-Spatial-Resolution Remote Sensing Images,” Remote Sens (Basel), vol. 15, no. 15, Aug. 2023, doi: 10.3390/rs15153755.

Y. Li, M. Zhang, C. Zhang, H. Liang, P. Li, and W. Zhang, “YOLO-CCS: Vehicle detection algorithm based on coordinate attention mechanism,” Digit Signal Process, 2024, doi: 10.1016/j.dsp.2024.104632.

Q. Su and J. Mu, “Complex Scene Occluded Object Detection with Fusion of Mixed Local Channel Attention and Multi-Detection Layer Anchor-Free Optimization,” Automation, vol. 5, no. 2, pp. 176–189, Jun. 2024, doi: 10.3390/automation5020011.

C. Conversion, “Citypersons Dataset,” Dec. 2022, Roboflow. [Online]. Available: https://universe.roboflow.com/citypersons-conversion/citypersons-woqjq

J. R. Terven and D. M. Cordova-Esparza, “A COMPREHENSIVE REVIEW OF YOLO ARCHITECTURES IN COMPUTER VISION: FROM YOLOV1 TO YOLOV8 AND YOLO-NAS PUBLISHED AS A JOURNAL PAPER AT MACHINE LEARNING AND KNOWLEDGE EXTRACTION.”

N. Jegham, C. Y. Koh, M. Abdelatti, and A. Hendawi, “YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2411.00201

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” Jul. 2018, [Online]. Available: http://arxiv.org/abs/1807.06521

Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.03151

Q. Hou, D. Zhou, and J. Feng, “Coordinate Attention for Efficient Mobile Network Design,” Mar. 2021, [Online]. Available: http://arxiv.org/abs/2103.02907

M.-H. Guo et al., “Attention Mechanisms in Computer Vision: A Survey,” Nov. 2021, doi: 10.1007/s41095-022-0271-y.

Z. Ren, H. Zhang, and Z. Li, “Improved YOLOv5 Network for Real-Time Object Detection in Vehicle-Mounted Camera Capture Scenarios,” Sensors, vol. 23, no. 10, May 2023, doi: 10.3390/s23104589.

S. Hao, W. Li, X. Ma, and Z. Tian, “SSE-YOLOv5: a real-time fault line selection method based on lightweight modules and attention models,” J. Real-Time Image Process., vol. 21, no. 4, May 2024, doi: 10.1007/s11554-024-01480-2.

Refbacks

  • There are currently no refbacks.