Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 11, Issue 3
Displaying 1-9 of 9 articles from this issue
  • [in Japanese]
    2004 Volume 11 Issue 3 Pages 1-2
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (260K)
  • DAISUKE KAWAHARA, SADAO KUROHASHI
    2004 Volume 11 Issue 3 Pages 3-19
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a method to detect and resolve zero pronouns in Japanese text. We detect zero pronouns by case analysis based on automatically constructed case frames, and rank candidate antecedents of a zero pronoun based on similarity to examples in the case frames. We also introduce an order of antecedent location preference to precisely capture the tendency that a zero pronoun has its antecedent in its close position. Large experimental results on 100 articles indicate that the precision and recall of zero pronoun detection are 87.1% and 74.8% respectively and the accuracy of antecedent estimation is 61.8%.
    Download PDF (1884K)
  • Attribute Expansion by Logical Relations between Words
    KAZUHIDE KOJIMA, HIROKAZU WATABE, TSUKASA KAWAOKA
    2004 Volume 11 Issue 3 Pages 21-38
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We think that realizing computers understanding natural language needs an association-system which outputs words related to input words. The associationsystem consists of the concept-base and an algorithm measuring the degree of association between words. In the concept-base, the meaning of a word is defined by a set of words (attributes) expressing the semantic feature of the word, with their weights show importance of the attributes to the word. This study aims to construct the concept-base. The first concept-base is automatically made from several electronic dictionaries. There are, however, many unsuitable attributes in the concept-base, and their weights are unreliable because of the automatic construction. A refining method to delete the unsuitable attributes and to correct the unreliable weighs by the degree of attribute reliability has proposed. But the refining method cannot increase the attributes. This paper proposes the method expanding the attributes of the refined concept-base, by logical relations between the words, and shows effects of the proposed method by evaluation with test data and the degree of association.
    Download PDF (1779K)
  • TAKEHIKO MARUYAMA, HIDEKI KASHIOKA, TADASHI KUMANO, HIDEKI TANAKA
    2004 Volume 11 Issue 3 Pages 39-68
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Sentences generally tend to be long and complicated in monologues, and they cause problems for parsing and translation. It is desirable to define some short unit to process monologues efficiently. We developed “CBAP (Clause Boundaries Annotation Program), ” which detects and labels every clause boundary in Japanese text. CBAP accepts a series of morphemes with part-of-speech information and detectsthe final boundary of every clause with more than 97% accuracy. It also inserts 147 kinds of labels which represent the types of the boundaries. Since clauses are syntactically and semantically sufficient constituents, we can use the annotated labels for effective and flexible sentence segmentation. In this paper, we show the method for annotating Japanese clause boundaries, and present the result of experiments to examine the performance of CBAP.
    Download PDF (3331K)
  • SATORU IKEHARA, SATSUKI ABE, MASATO TOKUHISA, JIN'ICHI MURAKAMI
    2004 Volume 11 Issue 3 Pages 69-95
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In order to breakthrough the limitation of the conventional method based on Compositional Semantics, it is expected to realize a new translation method based on Sentence Patterns in which non-linear structures of linguistic expressions are represented as semantic units. This paper proposes the way to judge the linearity or non-linearity of linguistic expressions based on their definitions and how to generate sentence patterns from huge bilingual corpora. According to this method, three kinds of sentence patterns such as “word level”, “phrase level” and “clause level” are generated in this order from Japanese to English corpus. In the experiments, 150, 000 sentence pairs for complex and compound sentences are extracted from one million sentence pair corpora, and 128, 000 patterns, 105, 000 patterns and 13, 000 patterns for each of three revels were generated from these sentence pairs. Due to the clarifications of decision process, the generation processes of the sentence patterns were mostly automated by using the results of morphological analysis and these 246, 000 sentence patterns have been obtained in a year.
    Download PDF (3150K)
  • ZHAOHUI Bu, TAKASHI IKEDA
    2004 Volume 11 Issue 3 Pages 97-122
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method for translating negative sentence in Japanese-Chinese MT. Japanese has a basic negative particle (nai), which always agglutinates the predicate. While in Chinese, several adverbs such as (bu), (mei), (bie) are used for expressing negation, which may appear in various places of the sentence according to the negative focus and syntax. Thus the selection of Chinese negatives and the position of Chinese negatives becometwo problems to be solved in Japanese-Chinese MT. In the Japanese-Chinese MT system named jaw/Chinese which we are developing, negative sentence will be translated in two processes. The basic proposition content is translated based on a pattem transfer method, and the negative particle is translated by using the selection rule and the position rule of Chinese negative particles. We construct these two rules by using: (1) The syntactic features of Japanese and Chinese sentence (2) The attributes of the predicate and its modifiers (adverbial modifier and complement) in Chinese.(3) The syntactic features of the negative focus in Japanese. The analysis about the relation between negative focus and the position of Chinese negative particles is an important point for making the position rule. We evaluated our translation algorithm manually using 113 negative sentences which extracted from about 1000 sentences, and got a translation accuracy of about 94%.
    Download PDF (2649K)
  • KUMIKO OHMORI, HIROAKI SAITO
    2004 Volume 11 Issue 3 Pages 123-147
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a new dialogue control method which can handle a huge number of words. When a conventional interface whose target words are not hierarchically classified fails to recognize a spoken word, the system asks the user to speak it again. This reutterance cycle is repeated until the correct word is recognized. If this repeated cycle continues, the user feels irritated and finally abandons the dialogue. Our proposed dialogue control method asks the attributes of words to narrow down the candidates when the recognition fails. We define the effectiveness of attributes based on the difficulty of recognition and the decrease rate of lexical entropy. We adopt three attributes against the domain of sir names: the number of characters, the initial, and the phonemes of the initial Kanji character. We have implemented an interface system of 87, 944 sir name recognition and confirmed that our method evades the irritating reutterance cycles and provides as little stressful dialogues as by human operators.
    Download PDF (2855K)
  • MASATO KANADECHI, MASATO TOKUHISA, JIN'ICHI MURAKAMI, SATORU IKEHARA
    2004 Volume 11 Issue 3 Pages 149-164
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper experimentally clarifies the ability of valency grammar for the word selection of verbs and nouns in machine translation. Valency grammar based translation has long been expected to confirm accurate results, because valency patterns in the grammar restrict the relationship among the meanings of a verb and nouns. However, the ability is not clarified yet because of the difficulty of developing this kind of knowledge base. Since a large scale dictionary Nihongo Goi Taikei (14, 800 valency patterns) is recently realized, in this paper the examinations of translating Japanese to English are conducted for several thousands of example sentences relating to IPAL dictionaries, and the translation results of fundamental nouns and verbs are compared with human translator's translations to obtain the translation accuracy. Next, the details of translation process of wrong translations are carefully analyzed to know reason of the failures and possibility to improve them. The result shows that each accuracy for the word selection of the verbs and the nouns is 89% and 91%. It is clarified that the valency grammar is effective in the selection of verbs but is not so effective in the selection of nouns as compared with baseline which are chosen from a Japanese/English dictionary. Further, each limitation of the accuracy for verbs and nouns by the valency grammar, which depends on the assumption of complete valency patterns, morphological analysis, and pattern matching, is estimated at 99% and 97% in the domain of the IPAL dictionaries related sentences.
    Download PDF (1636K)
  • MASAO UTIYAMA, KIYOMI CHUJO, EIKO YAMAMOTO, HITOSHI ISAHARA
    2004 Volume 11 Issue 3 Pages 165-197
    Published: July 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Mastery of domain-specific vocabulary in specialized English texts is essential. In order to identify a cost-effective and efficient means to extract domain-specific vocabulary, eight individual statistical measures, and combinations of those measures, were applied to corpora and the resulting lists were then compared to an existing specialized vocabulary control list. It was found that not only was it possible to efficiently produce a list of specialized vocabulary, but a combination of measures created the most comparable data. Due to the complexity of applying combinations of measures, individual measures were also found to be effective and useful for both English teachers and researchers. The complementary similarity measure was ranked as the most effective individual measure. Moreover, each measure created a unique type of word list which has specific pedagogical applications to student proficiency levels and lexicons.
    Download PDF (3423K)
feedback
Top