対訳辞書からの概念項目の自動抽出

徳永 健伸; 田中 穂積

doi:10.11517/jjsai.6.2_228

抄録

To improve the quality of machine translation systems, we should step toward the deeper analysis at the conceptual level. Developing the machine translation systems with deeper analysis requires the dictionaries including following information ; the set of conceptual items, the mapping relation between the surface words and the conceptual items, and the mapping relation between the conceptual items of the source language and that of the target language. There are several researches to compile such dictionaries. Japan Electronic Dictionary Research Institute (EDR) is now compiling such dictionaries on a large scale. Nirenburg, et al. at Carnegie Mellon University has proposed a systematic method to construct a conceptual dictionary. These attempts try to compile the dictionary by hand with the help of software tools. However this approach suffers from the problems such as huge amount of manual labor, the unstable result and so forth. Unlike this approach, the paper proposes a method to extract the information about the conceptual items from a pair of machine readable bilingual dictionaries in an automatic way. It is very difficult to compile the complete dictionary in a fully automatic way. The results of the method may require some refinement and modifications by human. Our goal is rather to automate the compilation process as much as possible and to decrease manual labor. In the paper, we make an approximation in that each word sense defined in the bilingual dictionary is considered as a conceptual item. Since each word sense has the proper translations in the bilingual dictionary, the above approximation is reasonable in terms of word choice in the translation, and we can easily get both the set of conceptual items and the mapping relation between the surface words and the conceptual items. The most difficult thing is to get the mapping relation between the conceptual items of the source language and that of the target language. The paper focuses on this issue. We introduce three types of translation circuits. The translation circuit is a tuple which consists of four elements, that is, a headword of both the languages and one of the word sense of both the headwords, with the condition that the word sense of one language should have the headword of the other language as a translation. We assume that the word senses in a translation circuit represent the same concept, that is, there is a mapping relation between the conceptual items (word sense) in a translation circuit. The paper describes the outline of a preliminary experiment conducted to verify this assumption. The results of the experiment are promising and some remarks are also given. We conclude the paper with pointing out the possibility by extending our method to construct the set of conceptual items which can be shared by more than two languages.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PDF閲覧時に認証を求められる記事がございます（発行後2年間）が，人工知能学会の個人会員は無料で閲覧可能です．認証のための購読者番号やパスワードは会員マイページ（ユース会員の場合はジュニア・ユース会員サイト）にログインし「お知らせ」にてご確認下さい（会員情報管理システムとオンラインで連携していないため，パスワードは同システムとは異なります．また，認証情報の更新は偶数月の月末に実施しております．新規入会された方は利用できるまでしばらくお待ちください）．個人会員以外は記事複製申込フォームから購入いただけます．また，アマゾンにて冊子版あるいはKindle版を購入いただくことも可能です．

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）