Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
On Cross-Lingual Text Similarity Using Neural Translation Models
Kazuhiro Seki
著者情報
ジャーナル フリー

2019 年 27 巻 p. 315-321

詳細
抄録

Accurately computing the similarity between two texts written in different languages has tremendous value in many applications, such as cross-lingual information retrieval and cross-lingual text mining/analytics. This paper studies the important problem based on neural networks. Specifically, our focus is on the neural machine translation models. While translation models are utilized, we pay special attention not to the translation itself but to the intermediate states of given texts stored in the translation models. Our assumption is that the intermediate states capture the syntactic and semantic meaning of input texts and are a good representation of the texts, avoiding inevitable translation errors. To study the validity of the assumption, we investigate the utility of the intermediates states and their effectiveness in computing cross-lingual text similarity in comparison with other neural network-based distributed representations of texts, including word and paragraph embedding-based approaches. We demonstrate that an approach using the intermediate states outperforms not only these approaches but also a strong machine translation-based one. Furthermore, it is revealed that intermediate states and translated texts work complementarily each other despite the fact that they are generated from the same NMT models.

著者関連情報
© 2019 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top