This study is a survey of perception of emphasized message as para-language related to mental attitudes and feelings. It was carried out on Japanese language learners (Chinese, English and Korean L1) and native Japanese speakers using Japanese natural speech and mother tongue native speech. Both L1 and L2 subjects recognized paralinguistic features in order ① pronounced high and long, ② pronounced long, ③ non-emphasized voice, ④ pronounced fast for both natural speech (66%) and mother tongue native speech (52%), the emphasis order was consistent, suggesting the possibility of universality of language. L1 Japanese speakers tended to reverse the order of recognition in the case of “kanashii” (sad) in both the perception (κ=0.22) and questionnaire results. L2 Japanese learners had more difficulty with paralinguistic information included in syllabic nasal and double consonants.
This paper investigates the interplay between tonal and quantity contrasts in Jinghpaw, a Tibeto-Burman language of China and Burma, by exploring the distributional asymmetry in the tonotactics of iambic disyllabic words in the language. Our phonological interpretation of the systematic gaps in the tonal string patterns led to the conclusion that the language has a reduced tonal contrast in light syllables relative to heavy syllables. This paper points out that our conclusion is by no means unnatural in light of the literature on contrast reduction in light syllables.
We investigated factors affecting clause-initial filler probability using an English monologue corpus, and compared the results with those of Japanese studies. The most powerful predictor was boundary strength before clauses. Clause-initial filler probability was higher at sentence boundaries than at clause boundaries, which is contradictory to results of Japanese. The number of words in clauses had a significant effect. Clause-initial filler probability increased with the number of words both in English and Japanese. The results from studies of two languages indicate that speech planning difficulties that fillers mainly reflect may differ depending on the language.
Filled pauses (FPs) in English can be either monophonemic ‘uh’ [ə] or polyphonemic ‘um’ [əm]. These differ temporally: shorter ‘uh’ is associated with shorter overall delay (including silent pauses). Japanese FPs are more varied, including both monophonemic ([ε], [ŋ]) and polyphonemic ([ε:to], [ɑno]) forms. This study compares the FPs of native Japanese speakers in a crosslinguistic speech corpus. Results show speakers use FPs with a lower F1 than native English speakers and strongly prefer the monophonemic form. Duration patterns are similar, but low proficiency speakers delay longer with monophonemic FPs. Results suggest possibilities for nonnative speech detection in speech applications.
Demonstratives are a particular group of linguistic expressions in Chinese, categorized as pronouns, determiners, adverbs, connectives, or fillers. Utilizing a phonetically labeled conversational corpus, we examined the spoken form of Chinese demonstratives in this study. Specific selection preferences for phonological variants were identified in the corpus. Filler demonstratives are generally longer than their lexical counterparts. Disyllabic lexical and filler demonstratives show seemingly contrasting duration patterns. Deviating from the falling tone carried by lexical originals, filler demonstratives tend to be flat with a mid-height onset, as illustrated by representative tonal contours obtained by a proposed computational tone model.
Acoustic differences between the vowels in filled pauses and ordinary lexical items such as nouns were examined to know if there was systematic difference of voice quality. Statistical test of material from the CSJ-Core showed that, in most cases, there was significant difference of acoustic features including F0, F1, F2, intensity, jitter, shimmer, spectral tilt, H1-A3, and duration between the two vowel classes. Random forests analysis revealed that, on average, F-value of about 0.8 could be obtained based on open data. In addition to duration, intensity, F0, and F1 were the most important for the classification, and jitter and H1-related indices made a secondary contribution.
This paper introduces a new fine-grained voice source analysis method and its application to filled pause analysis in the CSJ (Corpus of Spontaneous Japanese). The new source analysis procedure is designed to provide annotation with reliable and precise descriptions of objective characteristics to items in large speech corpora. This design target made the new analysis method provide far more accurate descriptions than existing methods. The new method provides the fundamental frequency estimate and the band-wise aperiodicity information simultaneously. It also provides an information-rich representation of a probability map of the fundamental component. This paper presents several analysis examples and discussions.