Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 7, Issue 4
Displaying 1-14 of 14 articles from this issue
  • [in Japanese]
    2000 Volume 7 Issue 4 Pages 1-2
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (251K)
  • SATOSHI TOJO
    2000 Volume 7 Issue 4 Pages 3-24
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The objective of this paper is to give formal semantics to aspects, independent of individual natural languages. Recent aspect theories explain that aspects are different ways of viewing the common event structure. However, the theories are still informal and not expressed in precise logical formalisms, and in addition, aspectual terms are different from one theory to another. We first survey those theories and give a united view to terms. Thereafter, we propose to adopt arrow logic for the representation of aspect, where we regard arrows as the fundamental temporal extent, instead of conventional time points or intervals. We show that the logic could tersely represent various aspects in regard to the relations between the event structure and its references. We also formalize the rules for aspectual shifts from temporally infinite forms to aspectual forms, in logical inferences. Therefore, we not only represent the static representation of aspects, but also show the dynamic processes from an event ontology to aspects.
    Download PDF (1966K)
  • KAZUHIDE YAMAMOTO
    2000 Volume 7 Issue 4 Pages 25-62
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A morpheme and part-of-speech system for Korean natural language processing, or machine translation in particular, is proposed in this paper. We designed this language system for easier computer processing. It is important to attain satisfactory performance when we segment and tag input Korean strings. There is also underand over-classification in a linguistic part-of-speech system for machine translation. Thus we defined an original part-of-speech system, which is demonstrated in this paper with some examples. We based our morphological analysis on the mixed n-gram statistics of both parts-of-speech and words. We tuned up this engine to the Korean language for proper characteristics. Experiments have proven that our engine has 99.1% word recall, 98.9% word precision, and 92.6% sentence accuracy, for unseen Korean strings. In language generation, spacing rules are proposed for Korean using our part-of-speech system. We have proven the appropriateness of our morpheme system in the performance of machine translation for both Japanese-Korean and Korean-Japanese, as shown in (Furuse, Yamamoto, and Yamada, 1999).
    Download PDF (3450K)
  • HAJIME MOCHIZUKI, MANABU OKUMURA
    2000 Volume 7 Issue 4 Pages 63-77
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The importance of the automatic summarization research is now increasing with the growing availability of on-line documents. In information retrieval systems, summaries can be used as the display of the retrieval results, in order for users to quickly and accurately judge the relevance of the documents which are returned as a result of the users'query. Here, rather than producing a generic summary, the summary that reflects the user's topic of interest expressed in the query would be considered as more suitable. This type of summary is often called‘query-biased summary’. In this paper, we show that our previously proposed passage extraction method based on lexical chains can be used to produce better query-biased summaries for information retrieval systems. To evaluate the effectiveness of our method, a task-based evaluation scheme is adopted. The results from the experiments support that querybiased summaries by lexical chains outperform others in the accuracy of subject's relevance judgments. Furthermore, to establish a better evaluation methodology, we also investigate and describe the problems that arise from the experimental design.
    Download PDF (3972K)
  • AKIRA KATAOKA, SHIGERU MASUYAMA, KAZUHIDE YAMAMOTO
    2000 Volume 7 Issue 4 Pages 79-98
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The purpose of this paper is to propose a method of paraphrasing a Japanese verbal noun phrase into a noun phrase in the form of “N1 no N2”. The semantic structure of “N1 no N2” can be recognized by supplementing some abbreviated predicate. We deine “deletable verbs” as these abbreviated predicates in two ways. 1. Choose verbs equivalent to the semantic relations of “N1 no N2” using a thesaurus. 2. Choose verbs associated with nouns. If a verb frequently cooccurs with a noun in newspaper articles, it is concluded that the verb is associated with the noun. By defining “deletable verbs” and utilizing a variety of the semantic structure of “N1 no N2”, this paraphrasing is accomplished. The subjective evaluation of our paraphrasing method shows that the precision is 63.8% and the recall is 61.4%. It is also shown that restriction on targets can increase the precision by 82.9%.
    Download PDF (1966K)
  • TAKEHIKO YOSHIMI, ICHIKO SATA, YOJI FUKUMOCHI
    2000 Volume 7 Issue 4 Pages 99-117
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    As a means of allowing for robust processing of such linguistic phenomena as inversion, ellipsis, parenthesis and emphasis, which are liable to prevent a syntactic parser from generating appropriate syntactic structures, this paper shows a method of automatically preediting sentences, based on information obtained by morpholexical and simple syntactic analysis. Addition of a preediting module to the existing system makes it possible 1) to generate better translations, which would not otherwise be generated, with little or no changes to the existing parts of the system, and 2) to reduce the load of syntactic analysis, thus enhancing the efficiency of the whole system. We have incorporated the proposed method into our English-to-Japanese machine translation system Power E/J, and carried out an experiment with sentences in news wire articles. The incorporation of the preediting module has satisfactorily 1) improved the quality of translations for the 260 sentences out of rewritten 330 ones (78.8%), and 2) marked up the speed of 1.12 times as fast as the system without the module.
    Download PDF (1914K)
  • YUKO ISHIZAKO, AKIRA KATAOKA, SHIGERU MASUYAMA, KAZUHIDE YAMAMOTO, SEI ...
    2000 Volume 7 Issue 4 Pages 119-142
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Informative summaries are used as substitutions for original texts.It is necessary for those for captions of newscasting, in particular, to keep the original information as much as possible. This paper discusses a summarization method of reducing overlaps to generate informative summaries. Deleting a part of the text which refers to the same content as some other part can reduce redundancy and avoid lack of information. In order to recognize overlaps, we utilize a pair of dependent words. Deletion of overlaps only using a pair of words having a dependency sometimes makes a summary unnatural and difficult to read. Therefore, what should be considered to cope with the problems is described. We compared summaries of TV news texts independently generated by our method and by human for evaluation. The experimental results show that the precision attained 85.1% and the recall attained 81.0%, respectively.
    Download PDF (2339K)
  • TAKEHIKO YOSHIMI, JIRI JELINEK, OSAMU NISHIDA, NAOYUKI TAMURA, HARUO M ...
    2000 Volume 7 Issue 4 Pages 143-162
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this treatise we set out to describe the linguistic expertise (dictionary and rules) embodied in the Japanese-to-English MT system TWINTRAN, and the evaluation of the translation results. TWINTRAN is based on the following design policy: 1) The translation equivalents and the direction of the translation process are strictly monodirectional, from Japanese to English. The analysis of the Japanese input is not confined to Japanese grammar but also anticipates at every step the possible English translation. 2) Disambiguation is based on prioritisation of each rule, where each rule contains a priority value and the highest aggregate priority candidate is selected. 3) Verb complements are screened for acceptability not only in the input Japanese but also in the output English, and anaphora resolution is used for arriving at the optimum result.
    In the window test we have carried out, based on NTT's functional MT test set, applying our evaluation procedure, 73.1% of the corpus was acceptable and the corpus average was above the point of acceptability.
    Download PDF (2014K)
  • KIYOTAKA UCHIMOTO, MASAKI MURATA, QING MA, SATOSHI SEKINE, HITOSHI ISA ...
    2000 Volume 7 Issue 4 Pages 163-180
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper we propose a method for acquiring word order from corpora.We define word order as the order of modifiers or the order of bunsetsus which depend on the same modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. It shows us to what extent each piece of information contributes to deciding the word order and which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is efficiently learned by a model within a maximum entropy (ME) framework.The performance of the trained model can be evaluated by checking how many instances of word order selected by the model agree with those in the original text. A raw corpus instead of a tagged corpus can be used to train the model, if it is first analyzed by a parser. This is possible because text in the corpus is in the correct word order. In this paper, we show that this is indeed possible.
    Download PDF (1717K)
  • KAZUHIDE YAMAMOTO, EIICHIRO SUMITA
    2000 Volume 7 Issue 4 Pages 181-204
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In Japanese spoken language processing, the subject and other cases are often omitted. Several approaches to resolve such an ellipsis have been proposed so far, but none of them have considered robustness against noisy input. It is important to also have some robustness in an ellipsis resolution module, since the inputs of the process are the results of a speech recognition module, which may have some recognition errors in spoken dialogue processing. We thus propose a robust model of ellipsis resolution which utilizes a multiple decision tree (MDT) model. Experimental results have proven its robustness, and have also shown that the model is task-independent and works more effectively if we provide decision trees with the number of attributes corresponding to the amount of noise.
    Download PDF (2170K)
  • KAI ISHIKAWA, EIICHIRO SUMITA
    2000 Volume 7 Issue 4 Pages 205-227
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Because of recognition errors, the performance (quality) of speech translation is degraded. Previously, we proposed a method that used only reliable parts of the recognition result for the translation. However, in this method, non-translated parts are omitted even if useful information exist in these parts. To overcome this problem, we propose an error correction method which is composed of the following steps: (1) The necessity of correction is judged and only utterances of the recognition results with “potentially” recoverable erroneous parts are retained.(2) The example utterances that have phonetically similar parts to the ones retained in the step (1) are retrieved from a text corpus, and correction hypotheses are created.(3) The reliability of the correction hypotheses is judged according to both semantic and phonetic point of view and the most reliable one is selected. The error correction method was incorporated into a speech translation system, and evaluated for speech inputs in travel conversations. As the results, the word error rate was reduced by 2.3%, and the acceptable translations rate was increased by 5.4%.
    Download PDF (3180K)
  • NOBUYUKI SHIRAKI, YOSHIYUKI UMEMURA, YOSHIHISA HARATA
    2000 Volume 7 Issue 4 Pages 229-246
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Recent information-oriented society becomes to need Car-Multi-Media systems.In the systems, speech recognition and synthesis systems are also necessary. We aimed to improve Bunsetsu Identification which is important for them. There are two types of traditional Bunsetsu Identification methods: one is a method which uses handmade rules and the other is a method which uses machine learning. The former has high accuracy rate, but there are some problems especially for Car-Multi-Media systems. For example, the method is not flexible because it needs fixed inputs, and the method needs a lot of efforts to keep identification rules because all rules are made by hand. The latter is robust for these problems, but the algorithms are much more complex to improve accuracy, so there are some problems for Car-Multi-Media systems. Therefore, we propose a new method that uses plural decision lists sequentially. The Decision List method is very simple, but it does not have very high accuracy rate. Then, we use not ‘one’ decision list but ‘plural’ decision lists ‘sequentially’. We made some experiments using 10, 000 sentences as a training corpus, and 10, 000 sentences as a test corpus in Kyoto-University-Corpus. As the result, the accuracy rate was 99.38%.
    Download PDF (2720K)
  • HISAHIRO ADACHI
    2000 Volume 7 Issue 4 Pages 247-259
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Sign language has an interesting characteristic that a change in part of the manual motion properties in hand-shape, location, movement often results in changing the meaning of signs. It is particularly called a minimal pair that the difference of properties between two signs is only one feature element. In building an electronic sign dictionary system, a couple of signs with similar manual motion properties play an important role in the retrieval, registration and synthesizing mechanism. This paper proposes a method for extracting a couple of signs with similar manual motion properties from a given set of signs. The method is based on the similarity between two signs, which is derived from the longest common subsequence (LCS) between manual motion descriptions (MMDs). It can be considered that a MMD represents information extracted from a series of motions of a sign. By computing the feature vectors of n properties from MMDs and plotting them in the n-dimensional Euclidean space, an angle between two vectors can be considered as the similarity between two signs. However, when the feature vector can be considered as a string of MMD, the similarity can be obtained by string matching between the two MMDs. The results of evaluation experiments show the applicability of the proposed method.
    Download PDF (1392K)
  • MASAO UTIYAMA, HITOSHI ISAHARA
    2000 Volume 7 Issue 4 Pages 261-270
    Published: October 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The effectiveness of various statistical measures of sentence importance was compared for automatic text summarization done by extracting important sentences. We focused on comparing various measures of sentence similarity on the assumption that important sentences in an article are similar to the title. Two types of similarity measures were compared: one uses word co-occurrence statistics and the other does not. The former proved superior to the latter. Other automatic text summarization methods, such as extracting the leading part of an article, or extracting sentences with important words, proved inferior to the similarity-based method. These results show that similarity measurement using word co-occurrence statistics is effective for automatic text summarization.
    Download PDF (816K)
feedback
Top