Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 3, Issue 4
Displaying 1-9 of 9 articles from this issue
  • [in Japanese]
    1996 Volume 3 Issue 4 Pages 1-2
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (157K)
  • HISASHI YASUNAGA
    1996 Volume 3 Issue 4 Pages 3-29
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a study on the text-data description rules for the Japanese classical literature. We have investigated the various functions for the text-data description by analyzing the research on the Japanese literature. As the results, we have found that it is necessary to consider the following three characteristics. These are the recognition and definition of the data structure for the literary works or books, of the text structure, and of the various features in the Japanese writing style. We have defined and developed the rule with three functions, called as the KOKIN-rule. This is the markup rule for encoding and composing of the electronic text on the Japanese literature. Many electronic texts have been defined based on the rule, such as a series of short stories, 21 Tanka anthologies, etc. Then, it is widely evaluated for the texts as the consistency and availability on the data description. Particularly, we have confirmed that it can be applicable to the other researches, such as the organization of text databases, registration to CD-ROMs, and conversion to the SGML standard.
    Download PDF (3279K)
  • MASAKI MURATA, SADAO KUROHASHI, MAKOTO NAGAO
    1996 Volume 3 Issue 4 Pages 31-48
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    When we are translating Japanese nouns into English, we face the problem of articles and numbers which the Japanese language does not have, but which are necessary for the English composition. To solve this difficult problem we classified the referential property and the number of noun phrases into three types respectively: generic, definite and indefinite for the referential property of noun phrases, singular, plural, and uncountable for the number of noun phrases. This paper shows that the referential property and the number of noun phrases can be estimated fairly reliably by the words in the sentence. Many rules for the estimation were written in forms similar to rewriting rules in expert systems with scores. Since this method uses scores, it is good to deal with vague problems like referential properties and numbers. We obtained the correct recognition scores of 85.5% and 89.0% in the estimation of referential property and number respectively for the sentences which were used for the construction of our rules. We tested these rules for some other texts, and obtained the scores of 68.9% and 85.6% respectively. These referential properties and numbers of noun phrases will be used not only for determination of articles and numbers but also for anaphora resolution and discourse analysis.
    Download PDF (1670K)
  • HIROMI NAKAIWA, SATORU IKEHARA
    1996 Volume 3 Issue 4 Pages 49-65
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method to resolve intrasentential references of Japanese zero pronouns suitable for application in widely used and practical machine translation systems. This method focuses on semantic and pragmatic constraints such as conjunctions, verbal semantic attributes and modal expressions to determine intrasentential antecedents of Japanese zero pronouns. This method is highly effective because the volume of knowledge that must be prepared beforehand is not so large and its precision of resolution is good. This method was realized in Japanese to English machine translation system, ALT-J/E. To evaluate the performance of our method, we conducted a windowed test for 139 zero pronouns with intrasentential antecedents in a sentence set for the evaluation of the performance of Japanese to English machine translation systems (3718 sentences). According to the evaluation, intrasentential antecedents could be resolved correctly for 98% of the zero pronouns examined using rules consistent for intersentential and extrasentential resolution. The accuracy was higher than the accuracy of the centering algorithm which is a conventional method to resolve zero pronouns. By the further examination of the evaluation, we found that this method can achieve high accuracy using relatively simple rules.
    Download PDF (1707K)
  • TERUMASA EHARA, YEUN-BAE KIM
    1996 Volume 3 Issue 4 Pages 67-86
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Probabilistic resolution method for Japanese zero-subjects is described. It is designed to be used for the back-end processor of an automatic shortening system of long Japanese sentences in a Japanese to English machine translation system. Ordinary probabilistic resolution method uses (1) normal distribution model in the continuous probability space. In this article, we propose 3 new models. They are (2) quasi-normal distribution model in the continuous space, (3) 1st order log-linear distribution model in the discrete space and (4) 2nd order log-linear distribution model in the discrete space. For these four models, we make an experiment to measure the resolution accuracy. The test sample is from television broadcasting news. The measured accuracy by the cross validation test are 73%, 78%, 78% and 81% for (1), (2), (3) and (4) models, respectively. The unresolved examples show that semanticagreement between subject and predicate should be observed more accurately.
    Download PDF (1686K)
  • HIROSHI YASUHARA
    1996 Volume 3 Issue 4 Pages 87-101
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Large scale language resources are key materials for practical natural language processing. The most common language resource is a dictionary which plays an important role in lexical processing. On the other hand many syntactic processing systems are based on context free grammars of phrase structure. CFG rules take complementary position of lexical data resource. In general the rules are absolute and difficult to get exact image in the analysis system. These properties make the syntactic analysis difficult to understand the total behavior when the size of rules grows. In this paper reduced type cooccurrence relations are collected from real text as a unique language resource of the Japanese kakari-uke dependency analysis. The data is a simple binary relation format of phrase dependency. It is extracted automatically using a syntactic analysis. The prototype system with eight thousand of the reduced cooccurrence relations showed eighty percent accuracy in kakari-uke dependency analysis of editorial articles. The system provides learning and incremental facility for the cooccurrency relation database.
    Download PDF (1452K)
  • An Improvement of a Probabilistic Context-Free Grammar Using Cluster-Based Language Modeling
    Kenji Kita
    1996 Volume 3 Issue 4 Pages 103-113
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes an improved probabilistic CFG (Context-Free Grammar), called the mixture probabilistic CFG, based on an idea of cluster-based language modeling. This model assumes that the language model parameters have different probability distributions in different topics or domains. In order to performs topic-or domaindependent language modeling, we first divide the training corpus into a number of subcorpora according to their topics or domains, and then estimate separate probability distribution from each subcorpus. Therefore, a mixture probabilistic CFG has several different probability distributions for CFG productuions. The language model probability of a sentence is calculated as the mixture of these probability distributions. The mixture probabilistic CFG enables us to make a context-or topic-dependent language model, and thus accurate language modeling would be possible. The proposed model was evaluated by calculating test-set perplexity using the ADD (ATR Dialogue Database) corpus and a Japanese intra-phrase grammar. The mixture probabilistic CFG had a test-set perplexity of 2.47/phone, while simple probabilistic CFG had a test-set perplexity of 2.77/phone. We also conducted speech recognition experiments using three language models, including pure CFG (without probabilities), simple probabilistic CFG, and the mixture probabilistic CFG. In our experiments, the mixture probabilistic CFG attained the best performance. The proposed model was also evaluated using sentence-level clustering. This evaluation used the dialogue corpus in which each utterance is annotated with an utterance type called IFT (Illocutionary Force Type). Using these IFTs, we divided the corpus into 9 clusters, and then estimated production probabilities from these clusters. Without IFT clustering, the perplexity was 2.18 per phone, but using IFT clustering, the perplexity was reduced to 1.82 per phone.
    Download PDF (919K)
  • KAZUAKI YOKOTA, HIROYUKI KAMEDA, HIROYA FUJISAKI
    1996 Volume 3 Issue 4 Pages 115-128
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A method for automatic acquisition of a grammar of Japanese based on morphemes from a corpus has already been proposed by the authors. This paper proposes a new method based on cognitive units, which have been experimentally found to be the units of the human process of sentence analysis, and have been known to be larger than morphemes. While the use of cognitive units can reduce the number of search paths, it may increase the number of unknown units and may degrade system performance. In order to cope with this problem, a method has been further introduced for identifying an unknown cognitive unit from known cognitive units. The proposed method was applied to the analysis of the ‘weather-forecast’ corpus, and the acquired grammar was used for parsing. The results indicated a higher processing efficiency for systems using cognitive units than for those using morphemes.
    Download PDF (1150K)
  • KAZUAKI YOKOTA, HIROYA FUJISAKI
    1996 Volume 3 Issue 4 Pages 129-139
    Published: October 10, 1996
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    While most natural language processing systems adopt morphemes as units of processing, humans are known to use larger processing units, which we call cognitive units. The human process of sentence analysis can be considered as consisting of two stages: detection and selection of cognitive units. Based on this idea, this paper proposes a method for sentence analysis which first detects possible cognitive units using a state transition diagram, and then selects correct cognitive units on the basis of their bigrams. The proposed method was applied to text error correction, and the experimental results confirmed that it can achieve a higher performance than that can be attained using morpheme bigrams.
    Download PDF (1959K)
feedback
Top