インタラクティブGAを利用した聴衆の評価を取り込んだ音の生成システムの提案

谷口 茉帆; 藤堂 健世; 安田 翔也; 山村 雅幸

doi:10.11517/pjsai.JSAI2020.0_1G3ES501

Abstract

When generating or selecting music/sound effects, it is necessary to search large audio databases to find an appropriate audio for the scene of animation or other video clips. However, the sound effects or background music generated by individual human experts may sometimes not make audience feel that it well matches with the scene. Therefore, an approach to generate audio considering listeners’ preferences is required. In this work, we suggest a way to generate a suitable audio for a scene using feedbacks from audience. In particular, we used SpecGAN, which is a kind of GAN that generated a wide variety of audio from latent space, and interactive GA, which is an optimization algorithm using human preferences in evaluation. In the process, the following steps were repeatedly done; SpecGAN generated audio from latent variables, human group ranks the audio, and the best group of latent variables were crossed over for create the next latent variables. As a result, we succeeded in controlling the direction of generating audio for individual scenes. We hope that the audio generated by the our method has significance as created by human experts.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!