Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Gamma-von-Mises restricted Boltzmann machine and its application to audio modeling
Toru NakashikaKohei Yatabe
著者情報
ジャーナル オープンアクセス 早期公開

論文ID: e24.95

詳細
抄録

To bypass phase estimation, complex-valued generative models have been developed to directly handle spectra of audio signals. The complex-valued restricted Boltzmann machine (CRBM) is one of such promising models proposed recently. However, similar to the other models, CRBM cannot treat the logarithmic nature of auditory perception important to realize a better model for audio application. This is because CRBM handles complex values in the rectangular coordinate (i.e., real and imaginary parts), which hinders applying the logarithmic transform to magnitude. To overcome this drawback of CRBM, we propose the gamma-von-Mises (GVM) RBM that models complex-valued spectra in the polar coordinate (i.e., magnitude and phase). GVM RBM handles magnitude by the gamma distribution using the logarithmic function and phase by the von Mises distribution. Our objective and subjective experiments showed that GVM RBM outperformed the other models including CRBM and complex-valued variational autoencoder (CVAE).

著者関連情報
© 2025 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
前の記事 次の記事
feedback
Top