Abstract
Retrieving relevant traffic scene data from existing database is essential in the development of advanced driver-assistance systems but such task is time consuming and computationally expensive. This study proposes a traffic scene retrieval system that utilizes a vision-language model and clustering techniques. The proposed system is capable of executing data retrieval task by inputting an image data or text as a search query. Evaluation results showed that the system was able to retrieve complex scene data(e.g., traffic congestion) from a driving video database under 3 seconds. Overall, the results indicate that the prosed system is feasible for practical applications.