Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
With the spread of driver assistance systems and autonomous driving technologies, their effectiveness in reducing traffic accidents has been discussed. However, for a further reduction of accidents, it is crucial to explain traffic accident risks and analyze their mechanisms. Research on explainable multimodal networks for driving scenes has attempted methods for generating captions by considering recognizable objects using metadata. Such methods typically focus on generating captions for dynamic objects, like humans. However, to explain traffic accident risks in driving scenes, static risks caused by road signs and road structures should also be considered during caption generation. Existing large-scale multimodal networks face difficulties in generating captions that address these types of road environment risks. To tackle this challenge, we propose a caption generation method that leverages prompt engineering to include both dynamic objects and static potential risks. Additionally, experiments using the generated captions confirmed the capability of producing captions that consider both dynamic objects and static potential risks.