Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 13, Issue 1
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    2006 Volume 13 Issue 1 Pages 1-2
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (323K)
  • MIHOKO KITAMURA, YUJI MATSUMOTO
    2006 Volume 13 Issue 1 Pages 3-25
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    High-quality MT systems and cross-lingual information retrieval systems need largesized translation dictionaries.Automatic extraction of translation patterns from parallel corpora is an efficient and accurate way to automatically develop translation dictionaries, and various approaches have been proposed to achieve this.This paper presents a practical translation pattern extraction method where translation patterns based on co-occurrence frequency of word sequences between English and Japanese can be greedily extracted, and manual confirmation or extra linguistic resources, such as chunking information and translation dictionaries, can be also effectively combined with.This paper examines the method of extracting probable translation patterns in incremental steps by gradually enlarging a unit of segmentalized corpus, in order to reduce the time spent on pattern extraction.Our experiments using 8, 000 sentences showed that the proposed method achieved an accuracy of 89%for coverage of 85%while the existing method achieved only an accuracy of 40%for coverage of 79%, and this was further improved to an accuracy of 96% for coverage of 85%when combined with manual confirmation.Our experiments using 16, 000 sentences showed that the method of dividing a corpus in quarters could reduce the extraction time to 9 hours while the nondividing method required 16 hours.
    Download PDF (2613K)
  • TOMOKO OHKUMA, HIROSHI MASUICHI, TAKESHI YOSHIOKA
    2006 Volume 13 Issue 1 Pages 27-52
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The purpose of this paper is to disambiguate meanings of Japanese focus particles‘made’, ‘nado’ and‘dake’. We propose two leyel rules for morphemes and grammatical functions.At the first level, morpheme rules are applied to morphemes output from Chasen morphological analyzer.At the second level, grammatical function rules are applied to f-structure output from Xerox Linguistic Environment (XLE) parser based on Lexical Functional Grammar (LFG) formalism.The first experiment using EDR corpus shows that morpheme rules disambiguate the particles and most of rules are accurate to about 100%.However two rules for ‘made’, two rules for ‘nado’ and a rule for ‘dake’ couldn't disambiguate their meanings correctly.The second experiment using Mainichi newspaper corpus shows that grammatical function rules successfully disambiguate the particles that the morpheme rules didn't cover.The accuracy of ‘made’ disambiguation by using morpheme rules is 69.6% and 58.4%, the accuracy of ‘nado’ disambiguation is 29.6% and 47.2%, and the accuracy of‘dake’ disambiguation is 55.8%. The accuracy of ‘made’ disambiguation by grammatical function rules is 73.2% and 61.8%, the accuracy of‘nado’ disambiguation is 72.5 and 60.3%, and the accuracy of ‘dake’ disambiguation is 76.1%.
    Download PDF (2202K)
  • HIROKAZU WATABE, NORIYUKI OKUMURA, TSUKASA KAWAOKA
    2006 Volume 13 Issue 1 Pages 53-74
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Though we receive a word with ambiguous information, we humans can interpret it properly, so we can hold a conversation, and take proper actions. This is possible because we have “common sense” concerning the word, which comes from knowledge accumulated from long time experiences. In order to realize an intelligent computer which can talk with human beings, we think that a construction of the system which understands word meaning is necessary. An association mechanism, which associates one word (concept) with other similar concepts, is indispensable to this construction. This paper describes a method of measuring the degree of association, which evaluates the relevance between concepts based on the Concept-Base, which defines the meaning of words. There is a problem in the conventional method, which evaluates the relevance between concepts using the degree of overlapping of the attribute sets in the Concept-Base. This paper aims to solve the problem of this conventional method and proposes a method of measuring the degree of association using the coincidence information between concepts.
    Download PDF (4078K)
  • Anh-Cuong Le, Akira Shimazu, Van-Nam Huynh
    2006 Volume 13 Issue 1 Pages 75-95
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Word Sense Disambiguation (WSD) is the task of choosing the right sense of a polysemous word given a context. It is obviously essential for many natural language processing applications such as human-computer communication, machine translation, and information retrieval. In recent years, much attention have been paid to improve the performance of WSD systems by using combination of classifiers. In (Kittler, Hatef, Duin, and Matas 1998), six combination rules including product, sum, max, min, median, and majority voting were derived with a number of strong assumptions, that are unrealistic in many situations and especially in text-related applications. This paper considers a framework of combination strategies based on different representations of context in WSD resulting in these combination rules as well, but without the unrealistic assumptions mentioned above. The experiment was done on four words interest, line, hard, serve; on the DSO dataset it showed high accuracies with median and min combination rules.
    Download PDF (2084K)
  • TAKEHIKO YOSHIMI, TAKESHI KUTSUMI, KATSUNORI KOTANI, ICHIKO SATA, HITO ...
    2006 Volume 13 Issue 1 Pages 97-115
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper points out that constructing a bilingual dictionary using translation equivalents obtained from bilingual corpora needs not only correct alignment of two expressions but also judgment of its appropriateness as an entry, and addresses the latter task which has been paid little attention. We show a method of selecting a suitable entry using Support Vector Machines, and propose to define the features by the common and the different parts between a current translation and a new translation. We examined how the selection performances are affected by the four ways of representing the common and the different parts: characters, morphemes, parts of speech, and semantic markers. Our experimental result found that representation by characters marked the best performance, F-measure of 0.837.
    Download PDF (2004K)
  • ERIKO YOSHIMURA, SEIJI TSUCHIYA, HIROKAZU WATABE, TSUKASA KAWAOKA
    2006 Volume 13 Issue 1 Pages 117-141
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    When we humans converse with a person, we greet at first.Also in computers or robots, the next subject comes after greeting, and so they communicate with a person smoothly.So far the greeting process was only to apply the template, and its research has not been done so much.However, there is a problem that the responses are uniform by only using a prepared template, and only the sentences made by designer appear.Especially it is often that only the greeting sentences made by designer appear in the conversation.Then, the greeting processing system of this paper produces new greeting sentences.These sentences do not exist in the greeting knowledge base that the designer prepared either.For talking, we humans have general knowledge and common sense of words.In the same way, we use in the greeting process, the association knowledge mechanism that has general knowledge and common sense for computer.The method we proposed in this paper is to expand and refine sentences by combining the association knowledge mechanism with the greeting knowledge base.
    Download PDF (6503K)
  • NOBUAKI TANAKA, MICHIHIKO MENRAI, TAKASHI NOGUCHI, TOMOKAZU YAGO, DONG ...
    2006 Volume 13 Issue 1 Pages 143-164
    Published: January 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We developed a summarizing system ABISYS based on the output of semantic analysis system SAGE. ABISYS extracts important words from an ariticle and generates summary sentences according to the word meanings and the deep cases among the words in the output from SAGE. In this paper, we define five kinds of scores to evaluate the importance of a word respectively on repetition information, context information, position information, opinion word information and topic-focus information. We first calculate the above scores for each substantive and reflect them in a five-dimensional space. Then the probability of each substantive to be important is calculated using SVM. Finally, we complement the indispensable cases for verbs and the sahen nouns that have been selected as important words, and use them as the summary element words to generate legible Japanese sentences. We carried out a subjectivity evaluation for our system output by refering to the summaries made by human. In comparison with the subjectivity evaluations made for other summarizing systems, we found that the point of legibility was on a par with other systems, while the point of content covering was much better. And 95% of the summary sentences generated by ABISYS were acknowledged as correct Japanese sentences.
    Download PDF (7516K)
feedback
Top