Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 8, Issue 2
Displaying 1-4 of 4 articles from this issue
  • [in Japanese]
    2001 Volume 8 Issue 2 Pages 1-2
    Published: April 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (254K)
  • HIROYUKI SHINNOU
    2001 Volume 8 Issue 2 Pages 3-18
    Published: April 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose the new method of Japanese word segmentation by Adaboost using the decision list as the weak learner. The word segmentation is regarded as the classification problem of judging whether the word boundary exists between two characters or not. By solving the problem by the decision list method, we can conduct Japanese word segmentation. Our method has the advantage not to suffer the unknown word problem because we do not use dictionary information as an attribute of our decision list. Moreover, by taking this approach we can use Adaboost which is actively researched in the machine learning domain recently. Adaboost improves the precision of our decision list. In experiments, we built the decision list through Kyoto University Corpus (about 40K sentences). The precision of this decision list was 97.52%. This values was much higher than the precision of character based tri-gram model, 92.76%. By using Adaboost method, our precision was improved to 98.49%. Furthermore, our word segmentation system was excellent in detecting unknown words.
    Download PDF (1412K)
  • Timothy Baldwin, Hozumi Tanaka
    2001 Volume 8 Issue 2 Pages 19-37
    Published: April 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This research looks at the effects of segment order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and segment order-sensitive string comparison methods, and test each over character-based and word-based indexing. The translation retrieval performance of each system configuration is evaluated empirically through the notion of segment edit distance between the translation output and model translation. Our results indicate that character-based indexing is consistently superior to word-based indexing in terms of raw accuracy, although segmentation does have an accelerating effect on TM search times in combination with a number of retrieval optimisation techniques. Segment order-sensitive approaches are demonstrated to generally outperform bag-of-words methods, with 3-operation edit distance proving the most effective comparison method. We additionally reproduced the same basic results over alphabetised data as for lexically differentiated data containing kanji characters.
    Download PDF (2001K)
  • HIROKAZU WATABE, TSUKASA KAWAOKA
    2001 Volume 8 Issue 2 Pages 39-54
    Published: April 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    It is thought that the main elements of commonsense judgement similar to human beings are a concept-base and the association mechanism based on the association between concepts. It is expected that the structure of the concept-base be as simple as possible since the concept-base has to be expanded and refined automatically by automated learning. This paper proposes a new method of measuring the degree of association between concepts. In the conventional method, a concept is expressed by first attributes vector model, and the degree of association between concepts is derived from an inner product of vectors. In this model, since each first attribute must be converted to its category, a category database such as a thesaurus is required. By the proposed method, the degree of association is derived using the chain of concepts without category. By experimental results using the concept-base, which consists of about 40, 000 concepts, it is shown that the proposed method outputs the closer degree of association to that decided by human judgement than the conventional method.
    Download PDF (1595K)
feedback
Top