End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion

Zhengdong Yang; Shuichiro Shimizu; Chenhui Chu; Sheng Li; Sadao Kurohashi

doi:10.5715/jnlp.31.935

Abstract

Speech-to-text translation (ST) translates speech from the source language into text in the target language. Because ST deals with different forms of language, it faces a language style gap between spoken and written language. The gap lies not only between the input speech and the output text but also between the input speech and the bilingual parallel corpora that are often used in ST. These gaps become an obstacle to improving the performance of ST. Spoken-to-written style conversion has been proven to improve cascaded Japanese-English ST by reducing such gaps. Integrating this conversion into end-to-end ST is desirable because of its ease of deployment, improved efficiency, and reduced error propagation compared to cascaded ST. In this study, we construct a large-scale Japanese-English lecture domain ST dataset. We also propose a joint task of speech-to-text spoken-to-written style conversion and end-to-end ST, as well as an interactive-attention-based multi-decoder model for the joint task to improve end-to-end ST. Experiments on the constructed dataset show that our model outperforms a strong baseline.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!