Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
In a live commentary, a commentator gives objective statements or subjective comments about the events in a video in real-time. The research about the automatic generation of such live commentary has been conventionally carried out for specific fields, such as sports, so it is common to use field-specific information to generate live commentaries. The subject of our study, the live commentary generation for open-domain videos cannot use domain-specific features, which makes it a difficult setting. We first construct a dataset with videos from various domains and live commentaries collected by crowdsourcing, then train a live commentary generation model that takes into account video and context. Experiments show that a multimodal Transformer that considers video and contextual text performs well.