Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Automatic Speech Recognition for the Archive of Ainu Folklores
Kohei MatsuuraMasato MimuraTatsuya Kawahara
Author information
JOURNAL FREE ACCESS

2021 Volume 28 Issue 3 Pages 824-846

Details
Abstract

In this article, our work on the speech recognition of Ainu folklores (Uwepeker) is described. First, we constructed an Ainu speech corpus for the Saru dialect based on the data provided by two museums that had constructed the Ainu archive. Next, we built an automatic speech recognition (ASR) system based on an attention-based encoder-decoder model, and compared four recognition units of phones, syllables, word pieces, and words. With the syllable unit, we achieved a phone recognition accuracy of 93.7% and 86.2%, and word recognition accuracy of 78.3% and 61.4% for the speaker-closed and speaker-open conditions, respectively. To address the problem of significant degradation in the speaker-open condition, an unsupervised speaker adaptation method using a CycleGAN is proposed. In this method, mapping of the speaker’s voice in the training data to the target speaker’s voice is learned by a CycleGAN, that converts all speech in the training data into the target speaker’s speech. This method reduced the phone error rate by up to 60.6%. In addition, we investigated language identification in Japanese and Ainu mixed speech and realized reasonable performance by cascading phone and word recognition modules.

Content from these authors
© 2021 The Association for Natural Language Processing
Previous article Next article
feedback
Top