Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields

Masayuki SUZUKI; Ryo KUROIWA; Keisuke INNAMI; Shumpei KOBAYASHI; Shinya SHIMIZU; Nobuaki MINEMATSU; Keikichi HIROSE

doi:10.1587/transinf.2016AWI0004

Abstract

When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!