Article ID: 2024EDP7318
Precisely detecting obstacles on the track is critical to the safety of railway transportation. However, existing track obstacle detection methods suffer from issues of low accuracy, slow speed, and high complexity, which are not qualified for real-time demand and low-resource constraints. This paper proposes a novel Railway Obstacle Detection (ROD) method named ROD-YOLO, striking a good trade-off between performance and efficiency. Firstly, we design a multi-scale Feature Enhancement Module (FEM), utilizing convolutions with different dilation rates to extract fine-grained features from different layers. Secondly, to improve detection speed, we propose the SPPCSPC-F spatial pyramid pooling module, which reduces the number of convolution units, the size of pooling operations and the dimensions of feature concatenation. Additionally, we incorporate the Large Selective Kernel (LSK) Attention to filter out interfering information and focus on important local features. Comprehensive experiments are conducted on a real-world dataset consisting of 12,270 images, aiming to verify the feasibility of object detection methods in complex railway environments. Results show that ROD-YOLO outperforms state-of-the-art one-stage and two-stage object detection methods, achieving 96.3% in precision, 91.4% in recall, and 96.6% of mAP at 0.5 IoU threshold. Compared to the most light-weight baseline (YOLOv8n), our method improves the mAP50 and inference speed by 7.93% and 72.42%, respectively, with only 36.19% growth in parameter size. Moreover, ROD-YOLO shows strong generalization ability on four cross-domain datasets, including a remote sensing image dataset and a traffic sign dataset. In conclusion, the proposed ROD-YOLO algorithm demonstrates remarkable performance in detecting track obstacles, provides valuable practice for deployment of object detection models in resource-constrained and security-crucial systems.