Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 10, Issue 4
Displaying 1-11 of 11 articles from this issue
  • [in Japanese]
    2003 Volume 10 Issue 4 Pages 1-2
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (245K)
  • Fuyuki Yoshikane, Tsuji Keita, Kyo Kageura, Christian Jacquemin
    2003 Volume 10 Issue 4 Pages 3-32
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we describe a rule-based mechanism that detects Japanese term variations from textual corpora. The system operates on the basis of meta-rules that map syntactic and morpho-syntactic variants of terms to the original forms of terms. The framework used here has been successfully applied to such languages as English and French, and we show here that it also works well in detecting Japanese term variants, once we properly take into account specific characteristics of the Japanese language. We also discuss the potential of this work for IR-related applications.
    Download PDF (2816K)
  • KOTARO FUNAKOSHI, TAKENOBU TOKUNAGA, HOZUMI TANAKA
    2003 Volume 10 Issue 4 Pages 33-53
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Speech dialog systems need to deal with various kinds of ill-formed speech inputs that appear in natural human-human dialog. Self-correction (or repair) is a particularly problematic phenomenon. Although many methods of dealing with self-correction have been proposed, they have limitations in both detecting and correcting this phenomenon. In this paper, we propose a new method overcoming ill-formedness of speech inputs. We evaluate the proposed method using a speech dialog corpus and discuss its effectiveness and limitation.
    Download PDF (2128K)
  • Wei-Bin Chang, Sachiko Morishita
    2003 Volume 10 Issue 4 Pages 55-63
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We address the problem of automatically transcribing Japanese orthographic words into symbols representing their pronunciations. Such a function is necessary for commercial continuous speech recognition systems since there are constant needs to create new recognition lexica for new applications or purposes. Simple look-up schemes are not adequate to deal with Japanese, while methods based on morphological analysis require in-depth linguistic knowledge and development effort. In this paper, we propose a statistical approach which is based on an N-gram language model. It is assumed that the pronunciation of a character only depends on the previous one to two characters and their pronunciations. Given an orthographic word, our method outputs the most likely phonetic transcription. It is shown that our approach provides superior performance to the public-domain conversion tool KAKASI on ten out of twelve test sets.
    Download PDF (789K)
  • NOBUHIRO KAJI, DAISUKE KAWAHARA, SADAO KUROHASHI, SATOSHI SATO
    2003 Volume 10 Issue 4 Pages 65-81
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method of predicate paraphrasing using an ordinary dictionary, which replaces a predicate with an equivalent word or phrase in its dictionary definition. The ordinary dictionary does not contain sufficient information for three sub-tasks of predicate paraphrasing: resolution of predicate sense ambiguity, extraction of the equivalent word or phrase from the definition, and proper transformation of case markers. To compensate for the insufficiency, we employ case frame alignment of two predicates (a headword and its equivalent predicate), which produces the predicate paraphrasing patterns. The experimental result of paraphrasing 220 test sentences demonstrates the effectiveness of this method.
    Download PDF (1508K)
  • Uighur Dictionary and Its Evaluation
    MUHTAR MAHSUT, YASUHIRO OGAWA, KAZUE SUGINO, YASUYOSHI INAGAKI
    2003 Volume 10 Issue 4 Pages 83-108
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The authors have constructed semiautomatically a Japanese-Uighur dictionary consisting of about 20, 000 items. This paper describes the process of generating our Japanese-Uighur dictionary from an available Uighur-Japanese one. We have investigated the vocabulary of our dictionary and found it includes about 80% of 2, 000 high-priority words of Japanese. We have also investigated the reasons why each word of the remaining 20% of 2, 000 high-priority words is not included in our dictionary, and have classified the words not included to five groups according to the reasons we found through our investigation.
    Download PDF (3491K)
  • KENGO SATO, HIROAKI SAITO
    2003 Volume 10 Issue 4 Pages 109-124
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a learning and extracting method of bilingual word sequence correspondences from aligned parallel corpora based on Support Vector Machines (SVMs), which are robust against data sparseness because of high ability of generalization and can learn dependencies of features by using a kernel function. Our method learns a translation model using features such as translation dictionaries, the number of words, part-of-speech, constituent words and neighbor words, and extracts bilingual word sequence correspondences by using the correspondence level based on SVMs. Conventional methods cannot extract bilingual word sequence correspondences which appear infrequently because of data sparseness which is caused by correspondence levels based on word co-occurrences. Our method, however, can extract them by the model which has been already learned by training corpora.
    Download PDF (1712K)
  • Kyo Kageura
    2003 Volume 10 Issue 4 Pages 125-143
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper quantitatively analyses the role of morphemes with respect to their types of origin. Static quantitative analysis of a given data set is not sufficient for this aim, as language data in general and terminological data in particular have the specific characteristic of being “incomplete” in the sense that many unseen elements are expected in the theoretical population. Thus, the quantitative structure of morphemes in terminology should be analysed dynamically, by observing the growth pattern of morphemes. In order to allow for that, we use binomial interpolation and extrapolation. Results of analyses of the terminologies of six different domains follow, revealing interesting characteristics of the role of morphemes of different types of origin that do not manifest themselves through static quantitative analysis.
    Download PDF (1829K)
  • A Question Answering System based on Large Text Knowledge Base
    YOJI KIYOTA, SADAO KUROHASHI, FUYUKO KIDO
    2003 Volume 10 Issue 4 Pages 145-175
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a dialog based QA system, Dialog Navigator, which can answer questions based on large text knowledge base. This system is targeted at users of personal computers. We released the system on the WWW in April 2002. In real world QA systems, vagueness of questions is a big problem. Our system can navigate users to the desired answers using the following methods: asking users back with dialog cards, and description extraction of each retrieved text. Another feature of the system is that it retrieves relevant texts precisely, using question types, synonymous expression dictionary, and modifier-head relations in Japanese sentences.
    Download PDF (6836K)
  • JUN SHIEH, ZHAOHUI Bu, TAKASHI IKEDA
    2003 Volume 10 Issue 4 Pages 177-200
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper we propose a method for mechanical translation of tense and aspect expressions from Japanese into Chinese. We deal with the expressions of ‘/ta’, ‘/ru’, ‘/teiru’ and‘/teita’ that play an important role in Japanesetense and aspect expressions. Based on syntactic characteristics and co-occurring information of both Japanese and Chinese, and temporal feature of Chinese predicates, the method shows how to translate these Japanese expressions into Chinese aspectual particles such as‘/le’, ‘/zhe’, ‘/zai’, ‘/guo’, and unmarked null. We make past researches into shape, which discuss usages of tense and aspect of both languages. Our method determines matching relationship to lay down the matching ambiguities between the expressions. We evaluate our algorithm by hand and get more than 80% of accuracy. The evaluation shows the method is effbctive and acceptable within machine translation.
    Download PDF (2232K)
  • MASAO UTIYAMA, HITOSHI ISAHARA
    2003 Volume 10 Issue 4 Pages 201-220
    Published: July 10, 2003
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large parallel corpus. We first used a method based on cross-language information retrieval to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the articles and sentences included many incorrect alignments. To remove these, we propose two measures that evaluate the validity of the alignments. Using these measures, we successfully extracted a valid correspondence of about 47 thousands article pairs, 150 thousands 1-to-1 sentence pairs, and 38 thousands 1-to-many sentence pairs. We were therefore able to build the largest Japanese-English parallel corpus available to the public.
    Download PDF (2201K)
feedback
Top