Abstract
SENSEVAL is an evaluation exercise for word sense disambiguation programs. This paper describes a Japanese dictionary task in the second SENSEVAL (SENSEVAL-2). This task defined word senses according to a Japanese dictionary, Iwanami Kokugo Jiten. Three data were distributed to the participants: the Iwanami Kokugo Jiten, the training data and the evaluation data. The training data was an word sense tagged corpus made up of 3, 000 newspaper articles, while the evaluation data was newspaper articles containing words of which participants' systems should determine correct word senses. The number of target words was 100, 50 nouns and 50 verbs. One hundred instances of each target word were provided, making for a total of 10, 000 instances. For constructing a gold standard data, two annotators chose correct word senses for 10, 000 instances separately. The inter-tagger agreement of two annotators was 0.863, while Cohen's κ was 0.657. When word senses selected by two annotators didn't agree, the third annotator chose the correct sense between them. 7 systems of 3 organizations participated in a Japanese dictionary task. The best score achieved by participants' systems was 0.786, while the score of the baseline system was 0.726.