Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 2, Issue 3
Displaying 1-5 of 5 articles from this issue
  • [in Japanese]
    1995 Volume 2 Issue 3 Pages 1
    Published: July 10, 1995
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (109K)
  • MASAHIRO MIYAZAKI, SATOSHI SHIRAI, SATORU IKEHARA
    1995 Volume 2 Issue 3 Pages 3-25
    Published: July 10, 1995
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Miura grammar is a Japanese grammar based on the Constructive Process Theory proposed by M.Tokieda, and developed by T.Miura. In this theory, language is composed of three processes: object, recognition and expression. These processes are combined by the law of causality. The state of an object is reflected in the speaker's recognition, and the way the speaker recognizes, the object gives rise to an expression. This paper proposes a Japanese syntactic category system (part of speech system) based on Miura grammar and formal description method of grammar rules for morphological processing, and discusses its use in Japanese morphological processing and syntactic analysis. Japanese words are classified into 400 hierarchical syntactic categories from the viewpoints of the class of the object itself and the manner of the speaker's recognition. The results of designing Japanese grammar rules for morphological processing using the proposed syntactic categories system and formal description method, show that it is easy to design and improve grammar rules, including nongeneral rules, by the proposed method. The proposed syntactic category system can be used to develop Japanese syntactic analysis, using nested structure models based on Miura grammar, without a gap between syntactic and semantic analysis.
    Download PDF (2137K)
  • Eiichiro Sumita, Kozo Oi, Osamu Furuse, Hitoshi Iida, Tetsuya Higuchi
    1995 Volume 2 Issue 3 Pages 27-48
    Published: July 10, 1995
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes an Example-Based Approach (EBA) using Associative Processors (APs) for machine translation, especially speech-to-speech translation, that requires (1) high accuracy and (2) a quick response. EBAs translate by mimicking the best-match translation examples (hereafter, “examples”), which are derived from corpora. These approaches are known to perform structural disambiguation, target word selection, and whole translation accurately. Therefore, EBAs fulfill the first requirement. The second requirement is also fulfilled by an EBA using APs as follows. The central mechanism of EBAs, Example-Retrieval (ER), retrieves the examples most similar to the input expression from an example database. ER becomes the dominant component as the size of the example database increases. We have parallelized ER by using APs consisting of an Associative Memory and a Transputer. Experimental results show that ER can be drastically accelerated by our method. Moreover, a study of communication among APs and an extrapolation from the sustained performance of 10APs demonstrate the scalability of our method against the size of the example database. Consequently, the EBA using APs meets the critical requirements of machine translation.
    Download PDF (2032K)
  • HIDEKI TANAKA
    1995 Volume 2 Issue 3 Pages 49-72
    Published: July 10, 1995
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Our English to Japanese machine translation system uses surface verbal case frames (case frames) to select a Japanese translation for an English verb. The need to acquire and accumulate case frames leads directly to two problems:
    How to obtain transparency in the case frames? Case frames are sometimes changed after they are written and it is hard to predict how the translation selection is affected by these changes.
    How to keep consistency in the case elements and their restrictions? These elements should be used consistently because the matching calculation between case frames and syntactic structures (parser output) expects fair use of these elements.
    To solve these problems, we propose two methods.
    Use of decision tree for case frame representation (case frame tree).
    Use of a statistical inductive learning algorithm to derive a case frame tree from a bilingual corpus.
    The first method solves problem one: a change at any node in a case frame tree will affect only the translations under the node which is changed. The second method solves problem two: the case elements and their restrictions are evaluated on the same basis according to their ability to distinguish the verb translations in the corpus. We used the learning algorithm C4.5, devised by Quinlan. C4.5 takes as input a table listing attribute, value, and class. To acquire a case frame tree, we replace attributes with case categories, values with restrictions of the case categories, and classes with Japanese translations of English verbs. We termed such a table a Primitive Case Frame Table (PCFT). Before doing acquisition experiments on seven English verbs (“come”, “get”, “give”, “go”, “make”, “run”, “take”), we constructed an English and Japanese bilingual corpus from the AP (Associated Press) wirenews texts, a corpus that turned out to be about 6, 000 translation pairs with syntactic tags. In the first experiment, we converted the corpus into the PCFT using all case categories appearing in the corpus and word forms for their restrictions. The acquired case frame trees basically duplicated the human work, but were far more precise in discriminating verb translations appearing in the corpus. Although the results indicate the basic effectiveness of our approach, the acquired case frame trees did not seem to have enough prediction power on open data since a lot of the word forms could be unknown words. To solve this problem, we generalized the word forms in the PCFT using semantic codes (Ruigo-Kokugo-Jiten, consisting of 4 digits) and then used C4.5. The five-fold cross-validation method was used to ensure the evaluation (error rate) precision. The error rate on open data for each verb was between 2.4% and 32.2%. Comparison of these figures with the baseline errors (error rates obtained by simply putting out the most frequent translation of a verb) showed a gain of between 13.6% to 55.3%, which indicates the basic effectiveness of using semantic codes. To lower error rate, we are devising an algorithm that can integrate word forms and semantic codes in an acquired case frame tree.
    Download PDF (2324K)
  • HIROYUKI SHINNOU, HITOSHI ISAHARA
    1995 Volume 2 Issue 3 Pages 73-86
    Published: July 10, 1995
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents an alternative method to measure word association strength on predicative patterns in order to automatically extract predicative frozen patterns and idioms from a corpus. For this aim, mutual information is traditionally used. We improve the method on mutual information from a view of linguistics. The proposed method are realized by following steps. First, a verb (or noun) is fixed. Next, the set of nouns (or verbs) which associates the verb (or noun) is built up. Last, nouns (or verbs) which have peculiar frequency are chosen from this set. The peculiarity is confirmed from two characteristics, which are ratio of the word frequency for total frequency of the set, and the number of kind of word in the set. Predicative frozen patterns are constructed from chosen words and the fixed word. The advantage of this method is that patterns extracted by fixing a verb and patterns extracted by fixing a noun have few common patterns, and each extraction has equivalent ratio of correctness to extraction by mutual information. Therefore, extracting same number patterns, this method can get more correct patterns than mutual information.
    Download PDF (1403K)
feedback
Top