Acoustical Science and Technology

FOREWORD

Preface to the special issue on speech communication

Takayuki Arai

2018Volume 39Issue 2 Pages 55
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.55

JOURNAL FREE ACCESS

Download PDF (25K)

PAPERS

Verbal disaster warnings and perceived intelligibility, reliability, and urgency: The effects of voice gender, fundamental frequency, and speaking rate

Kenta Ofuji, Naomi Ogasawara

2018Volume 39Issue 2 Pages 56-65
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.56

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we study the effects of acoustic characteristics of spoken disaster warnings in Japanese on listeners' perceived intelligibility, reliability, and urgency. Our findings are threefold: (a) For both speaking speed and f_o, setting them to normal (compared from slow/fast ({+}/{-}20%) for speed, and from low/high (+/- up to 36 Hz) for f_o) improved the average evaluations for Intelligibility and Reliability. (b) For Urgency only, setting speed to faster (both slow to normal and normal to fast) or setting f_o to higher (both low to normal and normal to high) resulted in an improved average evaluation. (c) For all of intelligibility, reliability, and urgency, the main effect of speaking speed was the most dominant. In particular, urgency can be influenced by the speed factor alone by up to 39%. By setting speed to fast (+20%), all other things being equal, the average perceived urgency raised to 4.0 on the 1–5 scale from 3.2 when the speed is normal. Based on these results, we argue that the speech rate may effectively be varied depending on the purpose of an evacuation call, whether it prioritizes urgency, or intelligibility and reliability. Care should be taken to the possibility that the respondent-specific variation and experimental conditions may interplay these results.

View full abstract

Download PDF (640K)
Airflow patterns of voiced geminate stops in Japanese

Masako Fujimoto, Seiya Funatsu

2018Volume 39Issue 2 Pages 66-74
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.66

JOURNAL FREE ACCESS

Show abstractHide abstract

Japanese voiced geminates have a tendency to devoice (e.g. baggu > bakku `bag'). Voiced obstruents have inherent susceptibility for devoicing due to the aerodynamic voicing constraints (AVC), and the susceptibility is higher for geminate obstruents than singletons. As a way to investigate how Japanese speakers realize the contrast between the [+/-voice] in obstruents, we examined oral and nasal airflow patterns during intervocalic voiced and voiceless stops in singletons and geminates. The results showed that no nasal airflow appeared during voiced and voiceless stops. Oral airflow showed asymmetry between single and geminate stops in realization of the stop voicing contrast. While the oral airflow pattern clearly differentiates the voiced vs. voiceless contrast in singletons, the patterns are similar in geminates. Acoustic signals also show the same asymmetry between the singletons and geminates. The observed convergence –- a clear voicing contrast in singletons vs. a lack of the contrast in geminates, both in oral airflow and acoustic signals, indicate the tendency of neutralization of the voiced geminates into voiceless ones. Our results support the idea of phonetic and articulatory bases in phonological patterning of voicing neutralization in Japanese geminate stops.

View full abstract

Download PDF (1682K)
Articulation strategies for English liquids used by Japanese speakers

Jeff Moore, Jason Shaw, Shigeto Kawahara, Takayuki Arai

2018Volume 39Issue 2 Pages 75-83
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.75

JOURNAL FREE ACCESS

Show abstractHide abstract

This study examines the tongue shapes used by Japanese speakers to produce the English liquids /ɹ/ and /l/. Four native Japanese speakers of varying levels of English acquisition and one North American English speaker were recorded both acoustically and with Electromagnetic Articulography. Seven distinct articulation strategies were identified. Results indicate that the least advanced speaker uses a single articulation strategy for both sounds. Intermediate speakers used a wide range of articulations, while the most advanced non-native speaker relied on a single strategy for each sound.

View full abstract

Download PDF (718K)
Linguopalatal contact contrasts in the production of Japanese consonants: Electropalatographic data from five speakers

Alexei Kochetov

2018Volume 39Issue 2 Pages 84-91
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.84

JOURNAL FREE ACCESS

Show abstractHide abstract

This study employed electropalatography (EPG) to explore place and manner of articulation differences in Japanese consonants. Linguopalatal contact data were collected from 5 native speakers using custom-made artificial palates. The materials included words with 10 word-initial consonants and a word-final moraic nasal. Quantitative analyses of the data revealed some consistent differences among consonants in constriction location and constriction degree, even within the same-place classes. Certain differences among dorsal consonants, as well as among consonants with no active lingual constriction were also observed. The results for Japanese coronal consonants were further compared to previous quantitative findings for English and Spanish with the goal to establish common manner-specific patterns of linguopalatal contact across languages.

View full abstract

Download PDF (970K)
Analysis of preferred speaking rate and pause in spoken easy Japanese for non-native listeners

Hafiyan Prafiyanto, Takashi Nose, Yuya Chiba, Akinori Ito

2018Volume 39Issue 2 Pages 92-100
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.92

JOURNAL FREE ACCESS

Show abstractHide abstract

We investigate the effect of speaking rate and pauses on the perception of spoken Easy Japanese, which is Japanese language with mostly easy words to facilitate understanding by non-native speakers. In this research, we used synthetic speech with various speaking rates, pause positions, and pause lengths to investigate how they correlate with the perception of Easy Japanese for non-native speakers of Japanese. We found that speech rates of 320 and 360 morae per minute are perceived to be close to the ideal speaking rate. Inserting pauses in natural places for Japanese native speakers, based on the dependency relation rule of Japanese, makes sentences easier to listen to for non-native speakers as well, whereas inserting too many pauses makes the sentences hard to listen to.

View full abstract

Download PDF (902K)
Periodicity, spectral and electroglottographic analyses of pressed voice in expressive speech

Carlos Toshinori Ishi, Jun Arai

2018Volume 39Issue 2 Pages 101-108
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.101

JOURNAL FREE ACCESS

Show abstractHide abstract

Pressed voice is a type of voice quality produced by pressing/straining the vocal folds, which often appears in Japanese conversational speech when expressing paralinguistic information related to emotional or attitudinal behaviors of the speaker. With the aim of clarifying the acoustic and physiological features involved in pressed voice production, we conducted periodicity, spectral and electroglottographic (EGG) analyses on pressed voice segments extracted from spontaneous dialogue speech of several speakers. Periodicity analysis first indicated that pressed voice is usually accompanied by creaky or harsh voices, having irregularities in periodicity, but can also be accompanied by periodic voices with fundamental frequencies in the range of modal phonation. Spectral analysis indicated power is usually reduced in low frequency components of pressed segments. A spectral measure H1'-A1' was then proposed for characterizing pressed voice segments which commonly has few or no harmonicity. H1'-A1' was shown to be effective for identifying most pressed segments, but fails when nasalization occurs. Vocal fold vibratory pattern analysis from the EGG signals revealed that most pressed voice segments (including nasalized vowels) are characterized by glottal pulses with closed intervals longer than open intervals on average, regardless of periodicity.

View full abstract

Download PDF (1335K)
Misperception of Japanese words with devoiced vowels and/or geminate consonants by young and elderly listeners

Eri Iwagami, Takayuki Arai, Keiichi Yasu, Kei Kobayashi

2018Volume 39Issue 2 Pages 109-118
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.109

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, two perception experiments were conducted to investigate the misperception of Japanese words with devoiced vowels and/or geminate consonants by young and elderly listeners. In Experiment 1, eight young normal-hearing listeners participated under a white-noise condition, and eight elderly listeners participated in Experiment 2. Two types of word sets which consist of combinations of vowels (V = /i, u/) and voiceless consonants (C = /k, t, s/) were used as stimuli. The first word set involved two- or three-mora words and the second word set had 14 minimal pairs of CVC (:) V, where (:) stands for with or without a geminate consonant. The results of both experiments showed that misperception was great for words with devoiced vowels and even greater for words with geminate consonants. In particularly, the misperception of consonants including high frequency components such as /shi/ or /shu/ was observed for elderly listeners.

View full abstract

Download PDF (1322K)
Constructing text-to-speech systems for languages with unknown pronunciations

Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi ...

2018Volume 39Issue 2 Pages 119-129
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.119

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method for constructing text-to-speech (TTS) systems for languages with unknown pronunciations. One goal of speech synthesis research is to establish a framework that can be used to construct TTS systems for any written language. Generally, language-specific knowledge is required to construct TTS systems for a new language. However, it is difficult to acquire language-specific knowledge in each new language. Therefore, constructing a TTS system for a new language entails huge costs. To address this problem, we investigate a framework for automatically constructing a TTS system from a target language database consisting of only speech data and corresponding Unicode texts. In the proposed method, pseudo phonetic information of the target language with unknown pronunciation is obtained by a speech recognizer of a rich-resource proxy language. Then, a grapheme-to-phoneme converter and a statistical parametric speech synthesizer are constructed based on the obtained pseudo phonetic information. The proposed method was applied to Japanese and was evaluated in terms of objective and subjective measures. Additionally, we challenged the construction of TTS systems for nine Indian languages using the proposed method, and TTS systems were evaluated in the Blizzard Challenge 2014 and 2015.

View full abstract

Download PDF (831K)
Effects of syllable position and vowel context on Japanese /r/: Kinematic and perceptual data

William F. Katz, Sonya Mehta, Matthew Wood

2018Volume 39Issue 2 Pages 130-137
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.130

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to investigate the articulatory processes involved in producing Japanese /r/, we obtained speech recordings for native talkers of standard Japanese using an electromagnetic articulography (EMA) system. Each talker produced repetitions of /r/ in a carrier phrase designed to contrast syllable (CV and VCV VCV) and vowel (/a/, /i/, /u/, /e/, and /o/) contexts. Kinematic recordings were made using tongue (tip, TT; dorsum, TD; body, TB; left lateral, TLL; and right lateral, TRL) and lower lip/jaw (LL) sensors. We measured TT vertical displacement, TT duration at maximum position, and tongue blade width for the consonant gestures. In a perceptual experiment, American English listeners decided whether these consonants consisted of `l,' `r,' or `d.' The kinematic results indicate Japanese talkers produced CV consonants with greater stricture and longer closures than consonants in intervocalic positions. CV productions also had narrower tongue blade widths than VCV VCV productions, especially in /i/ and /u/ contexts. The data were modeled with Dirichlet regression in order to determine how strongly tongue width and context (syllable and vowel) factors predict listeners' judgments. The results showed a significant fit for `r' judgments, with the tongue width fit successively increased by the addition of syllable and vowel context information.

View full abstract

Download PDF (1075K)

ACOUSTICAL LETTERS

Durational vowel-coda interaction in spontaneous Japanese utterances

Shigeto Kawahara

2018Volume 39Issue 2 Pages 138-139
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.138

JOURNAL FREE ACCESS

Download PDF (53K)
Generational changes in Iu-Mien prevoicing

Ela Thurgood

2018Volume 39Issue 2 Pages 140-142
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.140

JOURNAL FREE ACCESS

Download PDF (159K)
Reversal of the relation between impressions of voice pitch and height of fundamental frequency: Cognitive biases caused by conversion of tone quality

Teruhisa Uchida

2018Volume 39Issue 2 Pages 143-146
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.143

JOURNAL FREE ACCESS

Download PDF (565K)
Generative model of spectra for a word using Fujisaki model and genetic algorithm

Tomio Takara, Akichika Higa, Ryoichi Eto, Go Ishikawa

2018Volume 39Issue 2 Pages 147-149
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.147

JOURNAL FREE ACCESS

Download PDF (695K)
The effect of speech enhancement in voice adaptation when building synthetic voices

Francis Del Prado, Yusuke Hioka, Catherine Watson

2018Volume 39Issue 2 Pages 150-153
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.150

JOURNAL FREE ACCESS

Download PDF (339K)
Vocal analysis of speech in adults with autism spectrum disorders

I-Fan Lin, Sadao Hiroya, Kosuke Asada, Satsuki Ayaya, Shin-ichiro Kuma ...

2018Volume 39Issue 2 Pages 154-157
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.154

JOURNAL FREE ACCESS

Download PDF (73K)
Bidialectal homophone effects in Kansai Japanese: An auditory lexical decision experiment

Karen Tsai, Nicholas A. Lester, Fermín Moscoso del Prado Mart&i ...

2018Volume 39Issue 2 Pages 158-159
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.158

JOURNAL FREE ACCESS

Download PDF (234K)
African American women's speech: Vowel inherent spectral change

Yolanda Feimster Holt, Charles Ellis, Jr.

2018Volume 39Issue 2 Pages 160-162
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.160

JOURNAL FREE ACCESS

Download PDF (326K)
Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters

Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisas ...

2018Volume 39Issue 2 Pages 163-166
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.163

JOURNAL FREE ACCESS

Download PDF (539K)
Recognizing emotions from speech using a physical model

Norihide Kitaoka, Shuhei Segawa, Ryota Nishimura, Kazuya Takeda

2018Volume 39Issue 2 Pages 167-170
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.167

JOURNAL FREE ACCESS

Download PDF (598K)
Influence of hand gestures on prosodic disambiguation of syntactically ambiguous phrases

Taro Okahisa, Ayako Shirose

2018Volume 39Issue 2 Pages 171-174
Published: March 01, 2018
Released on J-STAGE: March 01, 2018

DOIhttps://doi.org/10.1250/ast.39.171

JOURNAL FREE ACCESS

Download PDF (1114K)

Register with J-STAGE for free!