Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 14, Issue 2
Displaying 1-5 of 5 articles from this issue
  • [in Japanese]
    2007 Volume 14 Issue 2 Pages 1-2
    Published: April 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (218K)
  • TAKESHI SAKAKI, YUTAKA MATSUO, KOKI UCHIYAMA, MITSURU ISHIZUKA
    2007 Volume 14 Issue 2 Pages 3-31
    Published: April 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a method to costruct related terms thesauri automatically based on Web information. We utilize Web search engine to obtain word co-occurrence information and propose a new efficient similarity metrics applying x2 value to solve problems of the existing methods. We also introduce a new method to identify related terms using word-clustering. We do word-clustering on that assocative network to identyfy related terms using latest clustering methods, “Newman method”. We make evaluations and show the effectiveness of our approach using sets of related terms extracted from a corpus and a current thesaurus.
    Download PDF (10173K)
  • MASATSUGU TONOIKE, TAKEHITO UTSURO, SATOSHI SATO
    2007 Volume 14 Issue 2 Pages 33-68
    Published: April 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper studies how to compile a bilingual lexicon for technical terms using the Web. In the task of estimating bilingual term correspondences of technical terms, it is usually rather difficult to find an existing corpus for the domain of such technical terms. In this paper, we adopt an approach of collecting a corpus for the domain of such technical terms from the Web. As a method of translation estimation for technical terms, we employ a compositional translation estimation technique, where translation candidates of a term are compositionally generated by concatenating the translation of the constituents of the term. Then, the generated translation candidates are validated using the domain/topic-specific corpus collected from the Web. This paper further quantitatively compares the proposed approach with another approach of validating translation candidates directly through a search engine. We show that the domain/topic-specific corpus collected from the Web contributes to achieving higher precision in translation candidate validation.
    Download PDF (6770K)
  • XIANGLI WANG, MASAHIRO MIYAZAKI
    2007 Volume 14 Issue 2 Pages 69-93
    Published: April 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Chinese sentence is parsed with PSG (Phrase Structure Grammar) rules in syntactic analysis. Grammatical rules based on PSG are not consistent so that ambiguity is a big problem. In this paper, we propose a new kind of grammar SSG (Sentence Structure Grammar), which describes all component and centers on predicative verb or adjective. We build a Chinese grammatical rule system based on SSG and mount it on extended chart parser Schart. The result of experiment showed that the syntactic analysis based on SSG that only uses the information of part of speech and grammatical rules is very consistent and very effective to reduce syntactic ambiguity, and it can gain a higher precision than syntactic analysis based on PCFG (Probabilistic Context Free Grammar).
    Download PDF (2064K)
  • Ayu Purwarianti, Masatoshi Tsuchiya, Seiichi Nakagawa
    2007 Volume 14 Issue 2 Pages 95-123
    Published: April 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a query transitive translation system of a CLIR (Cross Language Information Retrieval) for a source language with a poor data resource. Our research aim is to do the transitive translation with a minimum data resource of the source language (Indonesian) and exploit the data resource of the target language (Japanese). We did two kinds of translation, a pure transitive translation and a combination of direct and transitive translations. In the transitive translation, English is used as the pivot language. The translation consists of two main steps. The first is a keyword translation process which attempts to make a translation based on available resources. The keyword translation process involves many target language resources such as the Japanese proper name dictionary and English-Japanese (pivot-target language) bilingual dictionary. The second step is a process to select some of the best available translations. We combined the mutual information score (computed from target language corpus) and TF × IDF score in order to select the best translation. The result on NTCIR 3 (NII-NACSIS Test Collection for IR Systems) Web Retrieval Task showed that the translation method achieved a higher IR score than the machine translation (using Kataku (Indonesian-English) and Babelfish/Excite (English-Japanese) engines). The transitive translation achieved about 38% of the monolingual retrieval, and the combination of direct and transitive translation achieved about 49% of the monolingual retrieval which is comparable to the English-Japanese IR task.
    Download PDF (10720K)
feedback
Top