Acoustical Science and Technology

Show abstractHide abstract

We propose a new analysis and synthesis system of speech using the genetic algorithm (GA) for the analysis and the Fujisaki's generative model of speech (Fujisaki model) for the synthesis. This system is a functional model to simulate human acquisition of speech through the process of imitation of spoken words. We represent the coarticulation effect using the Fujisaki model. We model the trial-and-error and emergent process of speech imitation using the GA. In our system, we regard "command" in the Fujisaki model as an articulatory gesture and detect it from the spectral sequence using the GA. In other words, the original phonemic target is inversely estimated automatically as the command in the Fujisaki model from the phonemically ambiguous speech spectrum caused by coarticulation. We evaluated the system by listening tests using synthesized speech. We also show that the system can represent the phenomenon of "predicted sound," which is a type of flush-lag effect unconsciously heard as a result of the normalization of coarticulation, by comparing the predicted sound with the inversely estimated sound.

View full abstract

Download PDF (1165K)

Register with J-STAGE for free!