Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 5, Issue 2
Displaying 1-5 of 5 articles from this issue
  • [in Japanese]
    1998 Volume 5 Issue 2 Pages 1-2
    Published: April 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (207K)
  • JUNG-IN KIM, JONG-HYEOK LEE, GEUNBAE LEE
    1998 Volume 5 Issue 2 Pages 3-24
    Published: April 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Both Korean and Japanese share many grammatical characteristics including the same word order, so that almost all Japanese-to-Korean machine translation systems would have adopted the direct translation strategy to take advantage of the similarities. Even in the direct translation for the very similar language pair, however there are still a lot of problems that have to be solved for high-quality translation. Out of them we only focus on the predicate translation, whose difficulty is caused not only by complex conjugation but also by the inconsistent syntactic category and the different relative order of modal expressions between two languages. To solve the difficulty, we propose a table-driven predicate generation in which a modalityfeature ordering and Lexicalizing table (called MFOLT) plays an important role to map Japanese predicates into their Korean equivalents via abstract pivot of symbolic modality features. Experimental evaluation was done with 2, 338 sentences extracted from Asahi newspaper and some Japanese grammar books, which turned out that the proposed method would make a good effect on predicate translation, showing the success rate of 97.5%.
    Download PDF (2044K)
  • Wide R. Hogenhout, Yuji Matsumoto
    1998 Volume 5 Issue 2 Pages 25-46
    Published: April 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We show how a treebank can be used to cluster words on the basis of their syntactic behavior. By extracting statistics on the structures in which words appear it is possible to discover similarities and differences in usage between words with the same part-of-speech. This clustering is compared to the conventional clustering based on co-occurrences. While conventional clustering can discover semantical similarities or the tendency to appear together, the method we present ignores these factors and places the focus on syntactical usage, in other words the sort of structures it appears in. We present a case study on prepositions, showing how they can be automatically subdivided by their syntactic behavior and we discuss the appropriateness of such a subdivision. We have also carried out experiments to compare the quality of clusters quantitatively. For this goal we used clusters based on syntactic behavior for improving the estimation of the distribution of the dependency relation between words. Since such a distribution is necessarily estimated with sparse data, an entropy test can show how informative the classes are about syntactic usage. Finally, we discuss a number of ways in which a classification of words can contribute to applications of natural language processing.
    Download PDF (1963K)
  • Jiri Stetina, Makoto Nagao
    1998 Volume 5 Issue 2 Pages 47-74
    Published: April 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents a new general supervised word sense disambiguation method based on a relatively small syntactically parsed and semantically tagged training corpus.The method exploits a full sentential context and all the explicit semantic relations in a sentence to identify the senses of all of that sentence's content words. It solves the sparse data problem of a small training corpus by substituting the words by their semantic classes.In spite of a very small training corpus, we report an overall accuracy of 80.3% (85.7, 63.9, 83.6 and 86.5%, for nouns, verbs, adjectives and adverbs, respectively), which exceeds the accuracy of a statistical sense-frequency based semantic tagging, the only really applicable general disambiguating technique. Because the method uses the sentential syntactic structure it is particularly suitable for integration with a probabilistic syntactic analyser.
    Download PDF (2762K)
  • SHINSUKE MORI, MAKOTO NAGAO
    1998 Volume 5 Issue 2 Pages 75-103
    Published: April 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes improving a stochastic Japanese morphological analyzer through a morpheme clustering and an amelioration of the unknown word model. As a morpheme clustering, we propose a method which ameliorates a morpheme-based n-gram model into a class-based n-gram model with cross entropy criterion. As an amelioration of the unknown word model, we propose a method to incorporate a given morpheme set, such as dictionary, into it. As the result of experiments on the EDR corpus, we observed improvements of the accuracy. The analyzer adopting both methods marked a higher accuracy than an anteriorly reported part-of-speech-based tri-gram model. This result tells us that our morphological analyzer is better than the previous one in terms of accuracy. In addition to these experiments, we compared our analyzer with a grammarian's intuition-based analyser. The experimental results have shown the error rate of the stochastic analyzer was meaningfully smaller than that of the heuristic analyzer. The stochastic approach to Japanese morphological analysis is of great advantage to the ad-hoc method in higher accuracy, as well as in facility of further organized improvements.
    Download PDF (2342K)
feedback
Top