Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning

Shuhei Imai; Aoi Kanagaki; Takashi Nose; Shogo Fukawa; Akinori Ito

doi:10.1250/ast.e24.46

This article has now been updated. Please use the final version.

Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning

Shuhei Imai, Aoi Kanagaki, Takashi Nose, Shogo Fukawa, Akinori Ito

Author information

Keywords: Voice conversion (VC), End-to-end VC, Non-parallel VC, Neural vocoder, Cycle-consistent learning

JOURNAL OPEN ACCESS Advance online publication

Article ID: e24.46

DOI https://doi.org/10.1250/ast.e24.46

The final version of this article is now available: Vol. 46 (2025), No. 1 pp. 116-119

Details

Abstract

This paper proposes a fast end-to-end non-parallel voice conversion (VC) named Tachylone. In Thachylone, speaker conversion and waveform generation is performed by a single vocoder network. In the training of Tachylone, a pre-trained universal neural vocoder is used as the initial model, and the model parameters are updated using source and target speakers’ non-parallel data based on cycle-consistent learning in an end-to-end manner. We compare Tachylone to conventional CycleGAN-based VC with objective and subjective measures and discuss the results.

Corresponding author

Register with J-STAGE for free!