Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion
Zhengdong YangShuichiro ShimizuChenhui ChuSheng LiSadao Kurohashi
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 3 Pages 935-957

Details
Abstract

Speech-to-text translation (ST) translates speech from the source language into text in the target language. Because ST deals with different forms of language, it faces a language style gap between spoken and written language. The gap lies not only between the input speech and the output text but also between the input speech and the bilingual parallel corpora that are often used in ST. These gaps become an obstacle to improving the performance of ST. Spoken-to-written style conversion has been proven to improve cascaded Japanese-English ST by reducing such gaps. Integrating this conversion into end-to-end ST is desirable because of its ease of deployment, improved efficiency, and reduced error propagation compared to cascaded ST. In this study, we construct a large-scale Japanese-English lecture domain ST dataset. We also propose a joint task of speech-to-text spoken-to-written style conversion and end-to-end ST, as well as an interactive-attention-based multi-decoder model for the joint task to improve end-to-end ST. Experiments on the constructed dataset show that our model outperforms a strong baseline.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top