2023 Volume 44 Issue 3 Pages 239-246
The previously proposed phantom silhouette method is promising for converting ordinary speech into whispered speech. It is a simple parametric method that uses high-quality vocoder-type speech analysis and synthesis. An ordinary speech sample is first analyzed using the WORLD vocoder. Then, based on the extracted spectral envelope, spectral features are manipulated so that the voice sounds like a whisper. The target speech is synthesized by driving it with white noise instead of the vocal source signal to make the whole speech sound voiceless. In this study, this method was applied to singing voices to generate whisper voices. In addition to actual singing voices, virtual singers' voices were generated using a Vocaloid voice synthesizer, and AI singers' voices synthesized using a NEUTRINO neural singing synthesizer were also tested to generate whisper voices from singing voices.