Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232

This article has now been updated. Please use the final version.

Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning
Shuhei ImaiAoi KanagakiTakashi NoseShogo FukawaAkinori Ito
Author information
JOURNAL OPEN ACCESS Advance online publication

Article ID: e24.46

Details
Abstract

This paper proposes a fast end-to-end non-parallel voice conversion (VC) named Tachylone. In Thachylone, speaker conversion and waveform generation is performed by a single vocoder network. In the training of Tachylone, a pre-trained universal neural vocoder is used as the initial model, and the model parameters are updated using source and target speakers’ non-parallel data based on cycle-consistent learning in an end-to-end manner. We compare Tachylone to conventional CycleGAN-based VC with objective and subjective measures and discuss the results.

Content from these authors
© 2024 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
feedback
Top