Discrimination between adult-directed and infant-directed speech using statistical models

Ryuta Nakagawa; Shota Nishitani; Hirokazu Doi; Kazuyuki Shinohara

doi:10.14849/psjproc.2007.0_244_4

抄録

Many lines of evidence indicate that infants prefer infant-directed (ID) speech to adult-directed (AD) speech. ID speech has a unique acoustic signature characterized by a higher fundamental frequency (pitch). However, the individual variation of the pitch is too large to discriminate ID speech and AD speech. So we introduced a statistical modeling method to automatically discriminate ID speech and AD speech without pitch information. Eight mothers and their infants were participated in this study. Each mother read 3 picture-books to her infant and to an adult. As a result, 2,034 sentences/21,304 words were recorded for ID and AD speeches. Twelve dimension mel-frequency cepstral coefficients (MFCC), their derivatives and a log-energy derivative were extracted from them as speech features. Using these features, mono-phone HMMs (hidden Markov models) representing ID and AD speech type were trained. Using these models, we determined whether a mother's speech was classified into ID or AD speech. Log likelihood of unknown speeches was calculated with these models, and ID or AD speech type of the model with maximum log likelihood was chosen as the discrimination result. In speaker open conditions, 80.9% of ID/AD speeches were correctly discriminated. This result implies that features associated with ID speech are included in the MFCC and a stochastic modeling method using HMM with MFCC is capable of ID and AD speech discrimination. [J Physiol Sci. 2007;57 Suppl:S244]

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）