Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
INVITED PAPERS
Spec2Spec: Towards the general framework of music processing using generative adversarial networks
Hyeong-Seok ChoiJuheon LeeKyogu Lee
著者情報
ジャーナル フリー

2020 年 41 巻 1 号 p. 160-165

詳細
抄録

The advent of deep learning has led to a great progress in solving many problems that had been considered challenging. Several recent studies have shown promising results in directly changing the styles between two different domains that share the same latent content, for example, from paintings to photographs and from simulated roads to real roads. One of the key ideas that lie in this series of domain translation approaches is the concept of generative adversarial networks (GANs). Motivated by this concept of changing a certain style of data into another style using GANs, we apply this technique to two challenging and yet very important applications in the music signal processing field: music source separation and automatic music transcription. Both tasks can be interpreted as a style transition between two different spectrogram domains that share the same content; i.e., from a mixture spectrogram to a specific source spectrogram in the case of source separation, and from an audio spectrogram to a piano roll representation in the case of music transcription. Through experiments using real-world audio, we demonstrate that one general deep learning framework, namely ``spectrogram to spectrogram'' or ``Spec2Spec,'' can successfully be applied to tackle these problems.

著者関連情報
© 2020 by The Acoustical Society of Japan
前の記事 次の記事
feedback
Top