対訳ペアにおける単語間の概念差の抽出

西村 一球; 村上 陽平; Pituxcoosuvarn Mondheera

doi:10.11184/his.27.2_125

Translated Abstract

Each word in one language and its translation in another do not necessarily represent the same concept due to asymmetry in meanings and cultural contexts, especially for polysemous words. In recent years, as the accuracy of machine translation has improved, multilingual communication is being supported. However, this conceptual difference can lead to misunderstandings in multilingual communication. Therefore, we proposed the conceptual differences extraction in translation pairs method to quantify the concepts represented by words using conceptual dictionaries. Specifically, we used WordNet and Multilingual-WordNet, which are multilingual versions of WordNet, in our method. The concept of each word in Japanese, Chinese, and Indonesian is quantified based on the Synset, which is the smallest unit of concept in WordNet. This makes it possible to extract the concept differences among words with overlapping concepts in these languages. Consequently, our method finds 27,005 (Japanese-Chinese), 60,581 (Japanese-Indonesian), and 14,175 (Chinese-Indonesian) word pairs to be conceptually different out of 104,626 (Japanese-Chinese), 173,233 (Japanese-Indonesian), and 42,468 (Chinese-Indonesian) word pairs in WordNet.

References

[1] Pituxcoosuvarn, M., Lin, D. and Ishida, T.: A method for automated detection of cultural difference based on image similarity, Collaboration Technologies and Social Computing: 25th International Conference, CRIWG+ CollabTech 2019, Kyoto, Japan, September 4–6, 2019, Proceedings 25, Springer, pp. 129–143 (2019).
[2] 山下直美，石田亨ほか：翻訳機を用いた対話における参照方法に関する分析，情報処理学会論文誌，Vol. 48, No. 2, pp. 939–948 (2007).
[3] Mikolov, T., Chen, K., Corrado, G. and Dean, J.: Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013).
[4] Chen, X., Liu, Z. and Sun, M.: A unified model for word sense representation and disambiguation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1025–1035 (2014).
[5] Patwardhan, S. and Pedersen, T.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts, Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together (2006).
[6] Yoshino, T., Miyabe, M. and Suwa, T.: A proposed cultural difference detection method using data from Japanese and Chinese Wikipedia, Proceeding of 2015 International Conference on Culture and Computing (Culture Computing), IEEE, pp. 159–166 (2015).
[7] 諏訪智大，宮部真衣，　吉野孝ほか：日本語版・中国語版Wikipediaを用いた文化差検出手法の提案，情報処理学会論文誌， Vol. 55, No. 1, pp. 257–266 (2014).
[8] Pfeil, U., Zaphiris, P. and Ang , C. S.: Cultural differences in collaborative authoring of Wikipedia, Journal of Computer-Mediated Communication, Vol. 12, No. 1, pp. 88–113 (2006).
[9] Cho, H., Ishida, T., Yamashita, N., Inaba, R., Mori, Y. and Koda, T.: Culturally-situated pictogram retrieval, International Collaboration, Springer, pp. 221–235 (2007).
[10] Koda, T.: Cross-cultural comparison of interpretation of avatars’ facial expressions, Proceedings of the IEEE/IPSJ Symposium on Applications and the Internet (SAINT-06) (2006).
[11] Fellbaum, C. and Vossen, P.: Challenges for a multilingual wordnet, Language Resources and Evaluation, Vol. 46, pp. 313–326 (2012).
[12] Fellbaum, C.: WordNet, The encyclopedia of applied linguistics (2012).
[13] Bond, F. and Foster, R.: Linking and extending an open multilingual wordnet, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362 (2013).

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）