Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
PAPERS
An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis
Arif AhmadMohammad Reza SelimMuhammed Zafar IqbalMohammad Shahidur Rahman
著者情報
ジャーナル フリー

2019 年 40 巻 6 号 p. 374-381

詳細
抄録

This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis systems as they describe how words are pronounced. Traditional, rule-based models do not perform well in unseen contexts. We propose to adopt a neural machine translation (NMT) model to solve the G2P problem. We used gated recurrent units (GRU) recurrent neural network (RNN) to build our model. In contrast to joint-sequence based G2P models, our encoder-decoder based model has the flexibility of not requiring explicit grapheme-to-phoneme alignment which are not straight forward to perform. We trained our model on a pronunciation dictionary of (approximately) 135,000 entries and obtained a word error rate (WER) of 12.49% which is a significant improvement from the existing rule-based and machine-learning based Bangla G2P models.

著者関連情報
© 2019 by The Acoustical Society of Japan
前の記事 次の記事
feedback
Top