Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
INVITED PAPERS
Spec2Spec: Towards the general framework of music processing using generative adversarial networks
Hyeong-Seok ChoiJuheon LeeKyogu Lee
Author information
JOURNAL FREE ACCESS

2020 Volume 41 Issue 1 Pages 160-165

Details
Abstract

The advent of deep learning has led to a great progress in solving many problems that had been considered challenging. Several recent studies have shown promising results in directly changing the styles between two different domains that share the same latent content, for example, from paintings to photographs and from simulated roads to real roads. One of the key ideas that lie in this series of domain translation approaches is the concept of generative adversarial networks (GANs). Motivated by this concept of changing a certain style of data into another style using GANs, we apply this technique to two challenging and yet very important applications in the music signal processing field: music source separation and automatic music transcription. Both tasks can be interpreted as a style transition between two different spectrogram domains that share the same content; i.e., from a mixture spectrogram to a specific source spectrogram in the case of source separation, and from an audio spectrogram to a piano roll representation in the case of music transcription. Through experiments using real-world audio, we demonstrate that one general deep learning framework, namely ``spectrogram to spectrogram'' or ``Spec2Spec,'' can successfully be applied to tackle these problems.

Content from these authors
© 2020 by The Acoustical Society of Japan
Previous article Next article
feedback
Top