Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
When generating or selecting music/sound effects, it is necessary to search large audio databases to find an appropriate audio for the scene of animation or other video clips. However, the sound effects or background music generated by individual human experts may sometimes not make audience feel that it well matches with the scene. Therefore, an approach to generate audio considering listeners’ preferences is required. In this work, we suggest a way to generate a suitable audio for a scene using feedbacks from audience. In particular, we used SpecGAN, which is a kind of GAN that generated a wide variety of audio from latent space, and interactive GA, which is an optimization algorithm using human preferences in evaluation. In the process, the following steps were repeatedly done; SpecGAN generated audio from latent variables, human group ranks the audio, and the best group of latent variables were crossed over for create the next latent variables. As a result, we succeeded in controlling the direction of generating audio for individual scenes. We hope that the audio generated by the our method has significance as created by human experts.