Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
A Similarity Measure for Estimation of One-to-Many Relationship in Corpus
EIKO YAMAMOTOKYOJI UMEMURA
Author information
JOURNAL FREE ACCESS

2002 Volume 9 Issue 2 Pages 45-75

Details
Abstract

In this paper, we consider the estimation of the one-to-many relationship between entities in corpus. Many works have been done to estimate the relationship between entities from corpus.Generally speaking, the most common method is based on the co-ocurrence of entities in a document of corpus, and this method implicitly assumes that the relationship is one-to-one mapping. The real relationship may sometimes be one-to-many relationship, and need some consideration for this property. We propose to use CSM (Complementary Similarity Measure) to detect this relationship. This measure is originally developed for character recognition system, and is known to work well for overlapped patterns with template pattern, but is rarely used for text processing. We have compared CSM with other similarity measures, including three kinds of mutual information, ∅coefficient, cosine, dice coefficient. and confidence We choose the names of prefectures and cities as the entities, which has real oneto-many relationship. For the evaluation, we have used three kinds of corpora. The first one is a synthesized from real relations. The second one is also svnthesized from relations but it contains an element of false relation. The third one is compiled from actual newspaper corpus. We have found that CSM is the best similarity measure for this experiment and works well for one-to-many relationship.

Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top