Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 8, Issue 3
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    2001 Volume 8 Issue 3 Pages 1-2
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (254K)
  • REI OGURO, KAZUHIKO OZEKI, YUJIE ZHANG, KAZUYUKI TAKAGI
    2001 Volume 8 Issue 3 Pages 3-18
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Conventional methods for text summarization are mostly based on the idea of selecting important sentences from a set of given sentences such as a paragraph or a whole text. Those methods have a merit that each selected sentence remains unchanged and is thus correct. However, it is sometimes necessary to shorten each sentence, when a higher compaction rate is required, or when a paragraph-by-paragraph summarization is not adequate. In such sentence compaction, it is important that a shortened sentence is natural as a Japanese sentence. In this paper, the sentence compaction problem is formulated as “a problem of selecting a subsequence of phrases from a given sentence that maximizes the sum of phrase significance scores and inter-phrase dependency scores.” Then, an efficient algorithm to solve this problem is proposed. Since this method takes inter-phrase dependency into account, a shortened sentence is expected to be grammatically correct and natural. This paper is focused on the derivation, computational complexity, and implementation issues of the algorithm, and will not discuss the matter of how to define the phrase significance score and the inter-phrase dependency score, though it will be a crucially important matter in practical applications.
    Download PDF (2428K)
  • HISAHIRO ADACHI
    2001 Volume 8 Issue 3 Pages 19-37
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Since sign language is a kind of visual language, there is “iconicity” as salient visual characteristics of the word formation. That is, iconicity in sign language refers to avisual resemblance between signs and the things they stand for (i. e. the meanings). The property of the meaning can be divided into the definition and characteristic features. For example, a sign for “house” provides a direct representation that both hands outline the shape of the roof of a house; there is a direct relation between the meaning of sign and a visual characteristic of what it presents as the definitionfeatures. However, a sign for “bankruptcy” provides an indirect representation that both hands touch each other after the ‘house’, which is derived from the causal relationship such that the house is destroyed by bankruptcy as the characteristic features. Although their words don't resemble in the meanings, there is similarity between their manual motion properties, that is, it can be considered that the ‘bankruptcy’ is a derivation of the ‘house’. Bybeing in contactwithJapanese, furthermore, signs are often formed by borrowing from a part of the elements of word formation. For example, a sig “Ao-mori” is a compound of the signs “blue” and “forest”. Borrowing also can be considered as symbolic iconicity in a broad sense. By clustering signs with similar manual motion properties, therefore, an important clue can be provided to explicate the relationship between the meaning of manual motion properties and the word formation. Furthermore, in an electronic sign dictionary system, it can be considered that the result of clustering play the significant role as knowledged database in the retrieval mechanism. This paper proposes a method for grouping signs into disjoint clusters with similar manual motion properties. The method is based on the similarity between manual motion descriptions (MMDs) appeared in the ordinary sign dictionary. By computing the similarity between the MMDs and translating them into the equivalence relation, the equivalence classes formed by the relation can be considered as clustering signs that are similar to each other. The results of evaluation experiments show the applicability of the proposed method.
    Download PDF (9273K)
  • SHIHO NOBESAWA, HIROAKI SAITO, MASAKAZU NAKANISHI
    2001 Volume 8 Issue 3 Pages 39-57
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Researches based on statistical information have been more significant in the field of natural language processing. The use of raw corpora is fascinating, as it is easy to obtain a certain amount of non-tagged texts. However raw corpora often contain unknown words and phrases, and this causes low accuracy of the experiments. Colloquialism has not been worked enough because of this problem, though the processing of colloquialism is strongly required for the emails and other tasks. In this paper we propose a simple method to obtain domain-specific sequences from unrestricted texts using statistical information only. Our method needs a non-tagged training corpus. We use the statistical information drawn from the training corpus to extract semantic character sequences automatically. We had experiments on sequence extraction on email texts, and succeeded in extracting significant semantic sequences in the test corpus. The sequences our system salvaged contain casual terms, proper nouns, and sequences with representation change such as pronunciation extension.
    Download PDF (1852K)
  • Michael Paul, Eiichiro Sumita
    2001 Volume 8 Issue 3 Pages 59-85
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a corpus-based approach to anaphora resolution of Japanese pronouns combining a machine learning method and statistical information. First, a decision tree trained on an annotated corpus determines the coreference relation of a given anaphor and antecedent candidates and is utilized as a filter in order to reduce the number of potential candidates. In the second step, preference selection is achieved by taking into account the frequency information of coreferential and non-referential pairs tagged in the training corpus as well as distance and counting features within the current discourse.
    Download PDF (2477K)
  • TAKEHIKO YOSHIMI
    2001 Volume 8 Issue 3 Pages 87-106
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In order to translate sentences of English containing pronouns into natural and suitable Japanese, it is frequently necessary either to eliminate pronouns or to turn them into some other expressions. As for eliminating unwanted pronouns, a set of manually-written rules has already been presented. In this article we propose to 1) offer a way of substituting unwanted pronouns for other expressions as well as eliminating them, and 2) use a decision tree learning algorithm to learn rules automatically from a corpus, without requiring human intervention. The features used for learning are selected from the linguistic constraints we have so far understood which apply on zero pronominalisation, and from the clues which have been used for anaphora resolution of zero pronouns in the engineering studies. Having applied the proposed method to the translation results of our English-to-Japanese machine translation system Power E/J, we found that in the cases where the judgement whether zero pronominalisation should be applied the accuracy of translation was 79.9%, where in addition to the above judgement the substitution of pronouns for other expressions was applied the accuracy was 72.2%. These results are well comparable with those obtained by hand-written rules. It also became clear that none of the selected features lowers the accuracy, which means we can use as features for our purpose not only the linguistic constraints on zero pronominalisation but also the clues for restoring zero pronouns.
    Download PDF (2075K)
  • Dongli Han, Haodong Wu, Teiji Furugori
    2001 Volume 8 Issue 3 Pages 107-121
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose an effective method for resolving overlapping ambiguities found in sentential analyses of Chinese. It detects the ambiguities by a FBMM scanner, resolves them by using the relevancy value (RV), a statistical measure for word co-occurrences taken from textual data on the Internet, and selects the correct word sequence for the sentence being analyzed. We use contextual information also when RVs are considered not sufficient to resolving the ambiguities and choosing the correct word sequence. An experiment for selecting the desired sequences shows a success rate of about 85%. This result is convincing and far better than those in other comparable studies.
    Download PDF (1303K)
  • MUHTAR MUHSUT, YASUHIRO OGAWA, YASUYOSHI INAGAKI
    2001 Volume 8 Issue 3 Pages 123-142
    Published: July 10, 2001
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Japanese and Uighur languages are agglutinative languages and they have a lot of syntactic similarities. Thus we can translate Japanese into Uighur sequentially by replacing Japanese words with corresponding Uighur words after morphological analysis of Japanese sentences. However, we should translate case particles correctly in order to prevent wrong reading, because they have important roles on both languages. In this paper, we propose a new approach to translation of case particles. For that purpose we researched the verb dictionary made by IPA and classified the use of Japanese case particles. Our approach selects a correct Uighur case particle using the combination pattern of verb and case particles. We also show the performance evaluation of the system.
    Download PDF (2222K)
feedback
Top