Each word in one language and its translation in another do not necessarily represent the same concept due to asymmetry in meanings and cultural contexts, especially for polysemous words. In recent years, as the accuracy of machine translation has improved, multilingual communication is being supported. However, this conceptual difference can lead to misunderstandings in multilingual communication. Therefore, we proposed the conceptual differences extraction in translation pairs method to quantify the concepts represented by words using conceptual dictionaries. Specifically, we used WordNet and Multilingual-WordNet, which are multilingual versions of WordNet, in our method. The concept of each word in Japanese, Chinese, and Indonesian is quantified based on the Synset, which is the smallest unit of concept in WordNet. This makes it possible to extract the concept differences among words with overlapping concepts in these languages. Consequently, our method finds 27,005 (Japanese-Chinese), 60,581 (Japanese-Indonesian), and 14,175 (Chinese-Indonesian) word pairs to be conceptually different out of 104,626 (Japanese-Chinese), 173,233 (Japanese-Indonesian), and 42,468 (Chinese-Indonesian) word pairs in WordNet.
View full abstract