2023 Volume 4 Issue 3 Pages 402-413
This paper proposes a Multi-modal transformer using sequential data for detecting and predicting the deterioration of winter road surface conditions caused by snow accumulation. The proposed method performs multimodal analysis using multiple modalities including images captured by a fixed-point camera and text data related to road surface conditions. When integrating these multiple modalities, we adopt the feature integration based on cross attention for compensating features based on complementation among multiple modalities, and improvement of the expressive power of the integrated features can be achieved. Besides, by applying time-series processing for input data at multiple times, the temporal changes in road surface conditions are considered. At the end of this paper, in otder to verify the effectiveness of the proposed method for both detection and prediction tasks, the experiments are conducted using the road surface conditions corresponding to the input data and the road surface conditions several hours after the input data as the supervised data.