Unified model for voice conversion of speech and singing voice using adaptive pitch constraints

Shogo Fukawa; Takashi Nose; Shuhei Imai; Akinori Ito

doi:10.1250/ast.e24.47

ACOUSTICAL LETTERS

Unified model for voice conversion of speech and singing voice using adaptive pitch constraints

Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito

Author information

Keywords: Voice conversion (VC), Singing voice conversion (SVC), CycleGAN, Unified model

JOURNAL OPEN ACCESS

2025 Volume 46 Issue 1 Pages 120-123

DOI https://doi.org/10.1250/ast.e24.47

Browse “Advance online publication” version

Details

Abstract

This paper proposes a voice conversion named SpSiVC that appropriately converts both speech and singing voices with a single model. Since the distribution of pitch between speakers is significantly different for speech and singing voices, voice conversion has been mainly evaluated as a separate task for speech and singing voice conversion. SpSiVC introduces an adaptive F0 loss, which enables conversion that implicitly switches the shift width of the logarithm F0 according to the type of input voice. We examine the effectiveness of the F0 constraints in objective and subjective evaluations.

Corresponding author

Register with J-STAGE for free!