Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 4, Issue 1
Displaying 1-9 of 9 articles from this issue
  • [in Japanese]
    1997 Volume 4 Issue 1 Pages 1-2
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (202K)
  • TAKEHIKO YOSHIMI, JIRI JELINEK, OSAMU NISHIDA, NAOYUKI TAMURA, HARUO M ...
    1997 Volume 4 Issue 1 Pages 3-21
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    For the purpose of creating a machine translation system capable of efficiently selecting the best interpretation of a text from all the possible interpretations, we have formulated a Text-Wide Grammar as a set of constraints and preferences defining the optimum interpretation. This TWG is interpreted and executed by means of lazy evaluation of preference on a packed shared forest. TWG is a grammar of text, defining constraints and priorities on morphological and syntactic structures, semantics and collocations as well as correferentiality. Constraints on correferentiality are formulated on the basis of the paradigm of depredication (otherwise known as themepacking). From all the interpretations of a text which share the optimum score for morphological well-formedness, the optimum interpretation of the text is selected as the one with the highest weighted aggregate of the separately obtained scores for syntactic goodness, semantic-collocational goodness and correferential density. The processing system carries out semantic-collocational and correferential analysis on the packed shared forest received from syntactic analysis. At that stage, only the processing immediately required for identifying the optimum interpretation is actually carried out, avoiding all other processing by suspending it until it is required. Other interpretations are obtainable by restarting the suspended processing.
    Download PDF (2056K)
  • YASUHARU DEN
    1997 Volume 4 Issue 1 Pages 23-40
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Recent advances in speech processing technologies have made the analysis of spoken language one of the central issues in natural language processing. One big feature of spoken language, that distinguishes it from written language, is that it is in various ways ill-formed, containing hesitations, repairs, ellipses, and so on. This makes it difficult to apply traditional linguistic-based methods to spoken language analysis. This paper proposes a method for properly dealing with ill-formedness, such as hesitations and repairs, in the course of syntactic/semantic analysis of spoken Japanese where sentences transcribed in Kanji-Kana characters are parsed and interpreted to obtain their frame representations. The method is based on a uniform model, which handles well-and ill-formed sentences in a uniform way, and realized by extending traditional dependency analysis. We, first, show the necessity of a uniform model with illustrating motive examples from our spoken dialogue corpus. Then, after providing the details of the method, we show its effectiveness with both illustrating examples of analyzing real sentences from the corpus and evaluating the performance of an experimental system.
    Download PDF (1822K)
  • YASUHARU DEN
    1997 Volume 4 Issue 1 Pages 41-56
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Recent advances in speech processing technologies have made the analysis of spoken language one of the central issues in natural language processing. However, it is difficult to apply traditional linguistic-based methods to spoken language analysis due to the ill-formedness of spoken language. We have proposed a spoken language analysis method based on a uniform model, which handles well-and ill-formed sentences in a uniform way. In this method, both the problem of finding the best interpretation of a sentence and that of detecting and recovering ill-formedness are resolved by finding the most preferred dependency analysis of the sentence. This paper presents a preference decision method for our spoken language analysis method. The method is corpus-based; the preference of an analysis candidate is determined according to how frequently such an analysis is observed in the training data. To overcome the data-sparseness problem, not only the instances exactly matching the candidate but also instances similar to the candidate are taken into account. We, first, overview our spoken language analysis method. Then, after providing the details of the preference decision method, we show its effectiveness with evaluating the performance of an experimental system.
    Download PDF (1694K)
  • HAODONG WU, TEIJI FURUGORI
    1997 Volume 4 Issue 1 Pages 57-70
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Prepositional phrase (PP) attachment is a major cause of structural ambiguity in English. This paper proposes a method to resolve PP attachment ambiguities that is based on local and global preferential rules. We first explain how conceptual information is used in PP attachment process. We then describe the way the information, drawn from a conceptual dictionary, is incorporated into the preference rules. When the attachment can not be decided by preference rules, we use probabilistic estimation to predict the right attachment. After putting the disambiguation process in an algorithm and tracing it with a few examples, we show a disambiguation experiment and compare its result with those of existing work: the success rate we attained is better than those of other methods by 2 to 5 percent.
    Download PDF (1282K)
  • CHUL-JAE PARK, KATSUHIKO KAKEHI
    1997 Volume 4 Issue 1 Pages 71-86
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes an inference method for acquiring morpheme information of unknown word from a large corpus. The method is comprised of three functions: inferring morpheme's part-of-speech, conjugation type, and conjugation (we call these morpheme attributes in this paper), updating inferred morpheme attributes by probability factors derived from a large corpus, and inferring Japanese language morphemes. The conjunctive relationships between words in a sentence are utilized to infer the morpheme attributes of unknown word. Since a Japanese sentence is a sequence of characters without any blank spaces to mark word boundaries, our system had to be able to identify word boundaries. To do this, it first follows character type sequence rules to search for the cardinal points of a partition.It then infers morphemes from the partition using the morphemes in its dictionary. The system has a complete dictionary which includes a few special parts of speech morphemes (particles and auxiliary-verb) in the initial stage. As the result of this morpheme attributes inference process, morphemes are then selected. Based upon these concepts, we developed a Japanese morpheme information acquisition system. Our experiments were conducted on a large corpus of 240, 000 morphemes. The text was composed of ASAHI newspaper editorials over a six-month period. We obtained an morpheme's accuracy inference rate of 90.5% for inflections and 95.2% for other parts of speech. The overall average morpheme's accuracy inference rate was 94.6%. There were 15, 523 unique headwords automatically obtained from a total of 228, 450 inferred morphemes.
    Download PDF (1590K)
  • MASAKI MURATA, MAKOTO NAGAO
    1997 Volume 4 Issue 1 Pages 87-109
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    It is necessary to clarify referents of pronouns in machine translation and conversational processing. We present a method of estimating referents of demonstrative pronouns, personal pronouns, and zero pronouns in Japanese sentences using examples, surface expressions, topics and focuses. In conventional work, semantic markers have been used for semantic constraints. On the other hand, we use examples for semantic constraints and show that examples are as useful as semantic markers through control experiments. We also propose many new methods for estimating referents of pronouns. For example, we use examples of the form “A of B” for estimating referents of demonstrative adjectives. The framework of estimating referents is as follows. We make the rules from the informations that are necessary for estimating referents of pronouns.By these rules, we list possible referents of a pronoun and give them points.We estimate that the possible referent with the highest score is the referent.This framework has the advantage of writing rules flexibly. When we experimented in this framework, we obtained a precision rate of 87% in the estimation of referent of demonstrative pronouns, personal pronouns, and zero pronouns on training sentences, and obtained a precision rate of 78% on held-out test sentences.
    Download PDF (2092K)
  • TAKEHIKO YOSHIMI, JIRI JELINEK
    1997 Volume 4 Issue 1 Pages 111-123
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a simple method of analysing correferentiality between sentences and later occurring noun phrases. Our method uses surface information and requires no complex data or processing mechanism. We represent a sentence and a later occurring noun phrase as dependency structures, and examine whether the two structures are matched. Where a matching between them can be established, we assume that the two are correferential. The rules for establishing structural matching are part of the paradigm of theme packing, namely the predictable changes of adverbal particles into adnominal particles and the disappearance of some non-essential information. In order to ascertain to what degree anaphora can be correctly traced by such simple processing, we have carried out an experiment centred upon sentences governed by a verb of the SAHEN category and later occurring noun phrases in which the head noun is formally identical with the invariable part of the SAHEN verb. Of the 178 pairs of such sentences and noun phrases selected from newspaper articles, 133 pairs (74.7%) were correctly identified as correferential or otherwise, in accordance with human judgement. Furthermore, as a side effect, the number of dependency structures to be considered can be reduced by selecting only the pairs of dependency structures with the best affinity through structural matching. By this method the average 3.4-fold structural ambiguity was reduced to average 1.8-fold.
    Download PDF (1368K)
  • KIYOAKI SHIRAI, TAKENOBU TOKUNAGA, HOZUMI TANAKA
    1997 Volume 4 Issue 1 Pages 125-146
    Published: January 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we describe a method to extract a probabilistic context free grammar of Japanese from a bracketed corpus. To extract grammar rules, we assign appropriate non-terminal symbols to the intermediate nodes of the bracketed trees by taking account of the heads of phrases. We estimate the probabilities of the rules based on their frequency of occurrence. We also propose several improvements to the extracted grammar. The size of the grammar is reduced by removing any redundant rules. The number of the parse tree is reduced (1) by allowing only a right linear binary branching tree for a constituent that consists of items of the same POS, (2) by subcategorizing the POSs “symbol” (“KIGOU”) and “postposition” (“JOSI”), and (3) by assigning a consistent structure to constructs representing clausal modality. Finally, we conducted an experiment that evaluated the proposed methods. 2, 219 grammar rules were extracted from about 180, 000 sentences. When we analyzed 20, 000 test sentences with the extracted grammar, a 92% acceptance rate was calculated, showing that the grammar has a broad coverage. For the most probable 30 parse trees, we obtained a 62% brackets recall, 74% brackets precision and 29% sentence accuracy.
    Download PDF (2234K)
feedback
Top