IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Fast and Lightweight Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine
Takuya KISHIDAToru NAKASHIKA
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024EDP7206

Details
Abstract

In this paper, we propose a fast and lightweight non-parallel voice conversion method based on minimizing the free energy of a restricted Boltzmann machine (RBM). The proposed method employs an RBM that learns the generative probability of acoustic features conditioned on a target speaker and iteratively updates the input acoustic features until their free energy reaches a local minimum, resulting in converted features. Due to the RBM framework, only a few hyperparameters need to be set, and the number of training parameters is minimal, ensuring stable training. When determining the step size of the update formula using the Newton-Raphson method, we found that the Hessian matrix of the free energy can be approximated by a diagonal matrix. This allows for efficient updates with minimal computational costs. In objective evaluation experiments, the proposed method demonstrated approximately 4.5 times faster conversion speed compared with StarGAN-VC and also outperformed StarGAN-VC in terms of Mel-cepstrum distortion. In subjective evaluation experiments, the performance of the proposed method was comparable to that of StarGAN-VC in similarity mean opinion score.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top