Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 6, Issue 1
Displaying 1-5 of 5 articles from this issue
  • [in Japanese]
    1999 Volume 6 Issue 1 Pages 1-2
    Published: January 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (202K)
  • KAZUHIDE YAMAMOTO, EIICHIRO SUMITA
    1999 Volume 6 Issue 1 Pages 3-28
    Published: January 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A decision-tree learning approach to ellipsis resolution that appears in Japanese dialogue is proposed. The method has a high flexibility since it only requires any dialogue corpus with part-of-speech and ellipsis taggings and any thesaurus. We provide three kinds of attributes to machine learning: semantic category of content words, functional words, and exophoric information. By the open tests against Japanese dialogue corpus in the topic of travel arrangement, it is proven that the proposed method performs satisfactory resolution accuracy against ‘ga’ (subject) and ‘ni’ (indirect object) cases. The following findings are also obtained in the discussion: (1) learning topic should be restricted if it is expected, otherwise widely-learned decision tree may perform best.(2) as a decision tree learns more, it tends to use more general attributes such as functional words. The problem of data size relative to decision-tree training is also discussed and found that resolution accuracy of proposed method may be saturated in 104-105 samples. Resolution accuracy of 80%-85%is expected at the highest.
    Download PDF (2530K)
  • QING MA, HITOSHI ISAHARA
    1999 Volume 6 Issue 1 Pages 29-42
    Published: January 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents a multi-neuro tagger that uses variable lengths of contexts for part of speech (POS) tagging based on longest context priority. The tagger is constructed using multiple neural networks, all of which can be regarded as single-neuro taggers with fixed but different lengths of contexts in inputs, and the longest context priority based selector. Because the trained results (weights) of the taggers with shorter lengths of contexts can be used as initial weights for those with longer lengths of contexts, the training time for the latter ones can be greatly reduced and the cost to train a multi-neuro tagger is almost the same as that to train a single-neuro tagger. In tagging, given that the target word is more relevant than any of the words in its context and the words in context may have different relevances, each element of the input is weighted by its relevance with information gain. Computer experiments show that the multi-neuro tagger has a correct rate of over 94% for tagging untrained data when a small Thai corpus with 8, 322 sentences that we have on hand is used for training. This result is better than any of the results obtained using the single-neuro taggers, which indicates that the multi-neuro tagger can dynamically find a suitable length of contexts in tagging.
    Download PDF (1310K)
  • TAKEHIKO YOSHIMI, TOSHIYUKI OKUNISHI, TAKAHIRO YAMAJI, YOJI FUKUMOCHI
    1999 Volume 6 Issue 1 Pages 43-57
    Published: January 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method of selecting important sentences from a text based on the evaluation of the connectivity between sentences by using surface information. We assume that the title of a text is the most concise statement which expresses the most essential information of the text, and that the closer a sentence relates to an important sentence, the more important this sentence is. The importance of a sentence is defined as the connectivity between the sentence and the title. The connectivity between two sentences is measured based on correference between a pronoun and a preceding (pro) noun, and on lexical cohesion of lexical items. In an experiment with 80 English texts, which consist of an average of 29.0 sentences, the proposed method has marked 78.2% recall and 57.7% precision, with the selection ratio being 25%. The recall and precision values surpass those achieved by conventional methods, which means that our method is more effective in abridging relatively short texts.
    Download PDF (1377K)
  • A Survey in a Machine-Readable Dictionary
    Ren Fuji, Jian-Yun Nie
    1999 Volume 6 Issue 1 Pages 59-78
    Published: January 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In Machine Translation (MT), using compound words or phrases often makes the translation process easier. For example, the phrase _??_ corresponds unambiguously to “infbrmation highway”. It is not necessary to break it down to _??_ (infbrmation), _??_ (highspeed) and _??_ (road). However, some compound words (phrases) in Chinese are composed of simpler words which can play significantly different roles in sentences when they are broken down. For example, thecompoundword _?? (machine translation) may be broken into _??_ (machine) and _??_ (translate), as in the sentence _??_ (He uses a machine to translate papers). We call such a compound word “Sensitive Word”. During Chinese MI processing, if the first segmentation result in which a sensitive word is segmentec as a single word leads to a failure, the alternative solution with the sensitive worc broken down is considered as the preferred one. This allows us to reach at a higher efficiency by avoiding examining unlikely segmentation solutions. In this paper, we describe the problems related to sensitive words. A machine readable dictionary has been examined, and 764 sensitive words have been found among 87 600 words. this shows that sensitive word is a common phenomenon in Chinese that is worth closer examination.
    Download PDF (1975K)
feedback
Top