Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Gamma-von-Mises restricted Boltzmann machine and its application to audio modeling
Toru NakashikaKohei Yatabe
Author information
JOURNAL OPEN ACCESS Advance online publication

Article ID: e24.95

Details
Abstract

To bypass phase estimation, complex-valued generative models have been developed to directly handle spectra of audio signals. The complex-valued restricted Boltzmann machine (CRBM) is one of such promising models proposed recently. However, similar to the other models, CRBM cannot treat the logarithmic nature of auditory perception important to realize a better model for audio application. This is because CRBM handles complex values in the rectangular coordinate (i.e., real and imaginary parts), which hinders applying the logarithmic transform to magnitude. To overcome this drawback of CRBM, we propose the gamma-von-Mises (GVM) RBM that models complex-valued spectra in the polar coordinate (i.e., magnitude and phase). GVM RBM handles magnitude by the gamma distribution using the logarithmic function and phase by the von Mises distribution. Our objective and subjective experiments showed that GVM RBM outperformed the other models including CRBM and complex-valued variational autoencoder (CVAE).

Content from these authors
© 2025 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
Previous article Next article
feedback
Top