Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 10, Issue 3
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    2003 Volume 10 Issue 3 Pages 1-2
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (147K)
  • KIYOAKI SHIRAI
    2003 Volume 10 Issue 3 Pages 3-24
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    SENSEVAL is an evaluation exercise for word sense disambiguation programs. This paper describes a Japanese dictionary task in the second SENSEVAL (SENSEVAL-2). This task defined word senses according to a Japanese dictionary, Iwanami Kokugo Jiten. Three data were distributed to the participants: the Iwanami Kokugo Jiten, the training data and the evaluation data. The training data was an word sense tagged corpus made up of 3, 000 newspaper articles, while the evaluation data was newspaper articles containing words of which participants' systems should determine correct word senses. The number of target words was 100, 50 nouns and 50 verbs. One hundred instances of each target word were provided, making for a total of 10, 000 instances. For constructing a gold standard data, two annotators chose correct word senses for 10, 000 instances separately. The inter-tagger agreement of two annotators was 0.863, while Cohen's κ was 0.657. When word senses selected by two annotators didn't agree, the third annotator chose the correct sense between them. 7 systems of 3 organizations participated in a Japanese dictionary task. The best score achieved by participants' systems was 0.786, while the score of the baseline system was 0.726.
    Download PDF (5885K)
  • Sadao Kurohashi, Kiyotaka Uchimoto
    2003 Volume 10 Issue 3 Pages 25-37
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes the SENSEVAL-2 Japanese translation task. In this task, word senses are defined according to distinct translations in a given target language. A translation memory (TM) was constructed which contains, for each Japanese head word, a list of typical Japanese expressions and their English translations. For each test word instance, participants were required to submit the TM record best approximating that usage, or alternatively, actual target word translations. There were 9 system entries from a total of 7 organizations.
    Download PDF (2650K)
  • Using Vector Space Model
    TADASHI KUMANO, HIDEKI KASHIOKA, HIDEKI TANAKA
    2003 Volume 10 Issue 3 Pages 39-59
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In the SENSEVAL-2 Japanese task, senses of Japanese words are defined with respect to differences from their English translations. The Translation Memory (TM) that has pairs of their Japanese expressions and their English translations included on target Japanese words can be treated as sense categorization. Each translation of a given Japanese expression including the target word is categorized by selecting an appropriate Japanese expression from the TM. We can consider the task to be the mono-lingual problem of selecting the Japanese expression having the most similar context among candidate Japanese expressions in TM. We developed the system that tackles the task by calculating the similarity context of words co-occuring with the target word. The system calculates the similarity between the input expression and TM expressions from “context feature vectors” which characterize context words co-occurring with the target word, in each dimension, using the vector space model. Context attributes represent context word information as the combination of the syntactic/distance relation to the target word and the morphological/semantic attributes of the context word itself. They enable us to handle various context characteristics in a unified manner. The system participating in SENSEVAL-2 achieved a precision and recall of 45.8%, using JUMAN+KNP as morphological/syntactic ana lyzer and NIHONGO GOI TAIKEI as thesaurus. The result shows that the semantic attributes of context words make the greatest contribution to the performance, and that those of the dependency relation make a limited contribution.
    Download PDF (1956K)
  • HIROYUKI SHINNOU
    2003 Volume 10 Issue 3 Pages 61-73
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we apply an unsupervised learning method using the EM algorithm which Nigam et al. have proposed for text classification, to disambiguation problems involving noun meanings taken up in Japanese Translation Task of SENSEVAL2. This method uses the EM algorithm, setting up hidden labels of unlabeled data as missing values of observational data, the Naive Bayes model as the generating model, and the conditional probabilities p (f|c) (where f is a feature and c is a label) as parameters of the model. As the result, the learned classifier is improved. In this study, we use only simple features for the classification, which are some words surrounding a target word. In the experiments, the precision of Naive Bayes classifier learned through only labeled data was 58.2%. The precision of the decision list learned through the same data was 58.9%, which is the Ibaraki record in the Translation Task contest. Our unsupervised learning method improved the precision to 61.8% by using unlabeled data in addition to labeled data. Furthermore, by revising a small part of labeled data, the precision levels of the Naive Bayes classifier and our unsupervised learning method were improved to 62.3% and 68.2% respectively.
    Download PDF (1224K)
  • HIROYUKI SHINNOU, SHUYA ABE
    2003 Volume 10 Issue 3 Pages 75-85
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we apply Inductive Logic Programming (ILP) to Japanese Translation Task of SENSEVAL2. Translation Task is regarded as a classification problem, and can be solved by inductive learning methods. However, we cannot use general statistical learning methods for this task, because this task has the serious problem that it is hard to create training instances newly. Therefore, the problem is how to learn a classifier from instances in Translation Memory, that is, small training data. To overcome this problem, we use ILP which can handle background knowledge in learning. This is a big advantage over statistical learning methods. Background knowledge means domain specific knowledge which are not described in training data clearly. Using background knowledge, we can learn rules through small training data. In this paper, we used Progol as a ILP system, and ‘bunrui-goi-hyou’ as background knowledge to achieve the precision 54.0% for Translation Task. This precision is superior to other systems in the contest which did not create new training instances.
    Download PDF (1056K)
  • KIYOTAKA UCHIMOTO, SATOSHI SEKINE, MASAKI MURATA, HITOSHI ISAHARA
    2003 Volume 10 Issue 3 Pages 87-114
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We describe the method for word selection in machine translation. Given an input sentence and a target word in the sentence, our system first estimates the similarity between the input sentence and parallel example sets called “Translation Memory.” It then selects an appropriate translation of the target word by using the example set with the highest similarity. The similarity is calculated using an example-based method and a machine learning model, which assesses the similarity based on the similarity of a string, words to the left and right of the target word in the input sentence, frequencies of content words of the input sentence and those of their translation candidates in bilingual and monolingual corpora in English and Japanese. Given an input sentence and a target word in the sentence, an example-based method is applied to them in the first step. Then, if an appropriate example set is not found, a machine learning model is applied to them. The most appropriate machine learning model is selected for each target word from several machine learning models by a certain method such as cross-validation on the training data. In this paper, we show the advantage of our method and also show that what kinds of information contributed to improving the accuracy based on the results of the second contest on word sense disambiguation, SENSEVAL-2, which was held in Spring, 2001.
    Download PDF (2842K)
  • Comparison of various types of machine learning methods and features in Japanese word sense disambiguation
    MASAKI MURATA, MASAO UTIYAMA, KIYOTAKA UCHIMOTO, QING MA, HITOSHI ISAH ...
    2003 Volume 10 Issue 3 Pages 115-133
    Published: April 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes our work for the Japanese dictionary-based lexical-sample task of Senseval-2. In this work, we compared various types of machine learning methods and features. For the contest, we submitted four systems to the Japanese dictionarybased lexical-sample task of Senseval-2. They were i) a support vector machine method, ii) a simple Bayes method, iii) a method combining a support vector machine and simple Bayes method, and iv) a method combining two kinds of a support vector machine method and two kinds of a simple Bayes method. The combined methods produced the best precision (0.786) among all the systems submitted to the contest. After the contest, we tuned the parameter used in the simple Bayes method, and it obtained higher precision. The system which achieved the best precision now was the method combining the two simple Bayes methods and its precision was 0.793. In this paper, we discussed the results of experiments changing the features used and investigated the effectiveness and the characteristics of each feature. From these results, we obtained an interesting conclusion that we could obtained good precision when we only used string features, which are strings of 1-gram to 3-gram just before/after the analyzed morpheme. We also showed some related works that are useful for future work on word sense disambiguation.
    Download PDF (2011K)
feedback
Top