2025 年 46 巻 1 号 p. 116-119
This paper proposes a fast end-to-end non-parallel voice conversion (VC) named Tachylone. In Thachylone, speaker conversion and waveform generation is performed by a single vocoder network. In the training of Tachylone, a pre-trained universal neural vocoder is used as the initial model, and the model parameters are updated using source and target speakers' non-parallel data based on cycle-consistent learning in an end-to-end manner. We compare Tachylone to conventional CycleGAN-based VC with objective and subjective measures and discuss the results.