IIP情報・知能・精密機器部門講演会講演論文集
Online ISSN : 2424-3140
セッションID: IIPC-5-11
会議情報

日本語黙声発話時における顔部および頸部周辺の表面筋電位に基づく音声合成システムの開発
*小林 叶昌楓 和憲綿貫 啓一
著者情報
会議録・要旨集 認証あり

詳細
抄録

In recent years, there has been a growing interest in using silent speech to generate speech and recognize speech content for various applications in the fields of medicine, human interaction, and entertainment. Gaddy et al. have developed a machine learning model that can generate speech from muscle activity (using electromyography (EMG)) with 64 % word recognition accuracy for one English speaker. This study examines the possibility of applying this approach to Japanese speech synthesis by fine-tuning a machine learning model with English data, investigating the useful EMG locations that have not been measured in previous studies, examining the feasibility of learning the EMG data from non-speaking individuals paired with speech from other people, and evaluating the effect of using phoneme-balanced sentences for improving the word recognition accuracy. The results of this study suggest that speech synthesis in Japanese is possible with a limited vocabulary, and that fine-tuning with English data improves the accuracy by 40 % relative to not performing fine-tuning. Adding more EMG locations, particularly in the neck (styloglossus and hyoglossus muscle groups), improves the word recognition accuracy by 60 % relative to not adding them. It has also been observed that it is possible to generate speech by learning the EMG data paired with speech from other people, and the usage of phonemebalanced sentences for data creation has been found to be useful. It is expected that the word recognition accuracy will improve with an increase in data in Japanese. Furthermore, it is expected that this technology will be able to generate speech in Japanese with unrestricted vocabulary.

著者関連情報
© 2023 一般社団法人 日本機械学会
前の記事 次の記事
feedback
Top