Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Japanese-English Cross Language Information Retrieval based on Comparable Corpora and Bilingual Dictionary
AKITOSHI OKUMURAKAI ISHIKAWAKENJI SATOH
Author information
JOURNAL FREE ACCESS

1998 Volume 5 Issue 4 Pages 77-93

Details
Abstract
This paper proposes a method to translate query terms for cross-language information retrieval (CLIR). CLIR is generally performed by query translation and information retrieval (IR). CLIR is less precise than IR because of query term translation ambiguities, especially in Japanese and English CLIR. We developed Double MAXimize criteria based on comparable corpora (DMAX), which is an equivalent translation selection method for machine translation (MT), by using term co-occurrence frequency in comparable corpora. Though a term should be translated into one word for MT, a query term should be translated into several appropriate terms for CLIR. This paper describes a generalized query term selection model, the GDMAX for CLIR. In this model, a source query is represented in the vector form of the term co-occurrence frequency in source corpora. Translation queries are searched by vector similarity calculation between a source query and a target query represented by the co-occurrence frequency in comparable target corpora. GDMAX was evaluated by using TREC6 (Text Retrieval Conference) English data and 15 Japanese queries. GDMAX queries had approximately 62% accuracy of human queries, and 6% higher accuracy than machine translation queries and 12% higher accuracy than bilingual dictionary-based aueries.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top