A new evaluation method, which can detect the small differences between the quality of the synthesized and natural voices, was investigated using a short-term memory paradigm. Seventeen subjects listened to a series of synthesized or natural nonsense monosyllables whose quality were estimated to have almost same on the ordinary articulation score test, and recalled them immediately after the presentation of a series. The length of a series was composed of 6 or 8 monosyllables.
The recall errors were compared between two types of voices. Some synthesized syllables (eg. ‘mo’, ‘za’, ‘be’) caused much more errors than the corresponding natural syllables. Furthermore, such tendency became very clear when these syllables were presented at the latter portion of a serial position.
These results would suggest that the short-term memory paradigm is very effective to evaluate the small defects in the quality of voices.
View full abstract