Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
An Efficient Way of Gauging Similarity between Long Japanese News Expressions
HIDEKI TANAKATADASHI KUMANONORIYOSHI URATANITERUMASA EHARA
Author information
JOURNAL FREE ACCESS

1999 Volume 6 Issue 5 Pages 93-116

Details
Abstract
We are developing a Japanese-to-English Translation Aid system for news translators. The system consists of a voluminous bilingual news database whose sentences are properly aligned across languages beforehand, and a similar expression search engine. A user can find past translation examples of input Japanese with the system. Similar expression search engines like the one in this paper have usually employed an AND retrieval technique that uses keywords in the input expression, to measure the similarity between the input and the target by the number of shared keywords. In many cases of applying such search engines to our database, however, a number of spurious search results have been produced as a consequence: the sentences have been quite long (88.9 Japanese characters on average) and a single sentence has often contained identical keywords many times. In this paper, we propose adding two constraints to the AND retrieval technique: the order and positions (deviations) of keywords. We enhance AND retrieval allowing it to be able to reflect some syntactic similarity by this inexpensive modification. We will show, through a set of experiments, that the proposed method significantly improves the level of user satisfaction in search results in a statistical sense, with only a 1.3-fold increase in the search time.
Content from these authors
© The Association for Natural Language Processing
Previous article
feedback
Top