Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2008Volume 15Issue 5 Pages 1-2
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_1

JOURNAL FREE ACCESS

Download PDF (240K)
A Study on Cross Transformation of Mongolian Language

Idomucogiin Dawa, Satoshi Nakamura

2008Volume 15Issue 5 Pages 3-21
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_3

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper discusses a segmentation approach of Mongolian for Cyrillic text for machine translation. Using this method, the processing of one-to-one word permutation between the variations of Mongolian and other languages, especially Altaic family languages like Japanese, becomes easier. Furthermore, it can be used for two-way conversion between texts of Mongolian used in different regions and counties, such as Mongolia and China. Our system has been implemented based on DP (dynamic programming) matching supported by knowledge-based sequence matching, referred to as a multilingual dictionary and linguistic rule bank (LRB), and a data-driven approach of the target language corpus (TLC). For convenience, NM (New Mongolian) is treated as the source language, and TM (Traditional Mongolian) and Todo as the target language in this test. Our application was tested using manually transcribed texts with sizes of 5, 000 sentences paralleled from NM to TM and Todo. We found that our method could achieve 91.9% of the transformation accuracy for “NM” to “TM” and 94.3% for “NM” to “Todo”.

View full abstract

Download PDF (18744K)
Expanding Indonesian-Japanese Translation Dictionary Using Pivot Language

MASATOSHI TSUCHIYA, TOSHIYUKI WAKITA, AYU PURWARIANTI, SEIICHI NAKAGAW ...

2008Volume 15Issue 5 Pages 23-43
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_23

JOURNAL FREE ACCESS

Show abstractHide abstract

Cross-lingual language resources are necessary to realize cross-lingual natural language processing. A large translation dictionary is especially important as such a resource, however, large dictionaries are available for few language pairs and small ones are only available for most language pairs. We propose a novel method to expand a small existing translation dictionary to a large translation dictionary using a pivot language. Cooccurrence vectors in the source language and ones in the destination language are compared based on the small existing translation dictionary, and provide information to select appropriate translations among translation candidates gotten from transitive translation using two translation dictionaries. Experiments that expand the Indonesian-Japanese dictionary using the English language as a pivot language show that the proposed method can improve performance of a real CLIR system.

View full abstract

Download PDF (7294K)
Construction of story summarization system toward producing coherent summary consistent with story line

HIKARU YOKONO

2008Volume 15Issue 5 Pages 45-71
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_45

JOURNAL FREE ACCESS

Show abstractHide abstract

Since a story consists of several scenes and topics, for making a summary of a story, it is essential to get hold of relations between topics. This means that to make a coherent summary is a key issue for informative summary of a story. On the basis of this background, in this paper, the author proposes a method to produce a coherent summary of story focusing on extracting (1) topic block that consists of sentences that may be written on the same topic, and (2) complement sentences that may express change of scenes. They are extracted on the basis of automatic topic recognition and identification of characters. The experimental results of summarization for 9 stories show the proposed method produces easier-to-follow summaries than those of a tf·idf based model.

View full abstract

Download PDF (2624K)
Construction of Domain Dictionary for Fundamental Vocabulary and its Application to Automatic Blog Categorization with the Dynamic Estimation of Unknown Words' Domains

CHIKARA HASHIMOTO, SADAO KUROHASHI

2008Volume 15Issue 5 Pages 73-97
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_73

JOURNAL FREE ACCESS

Show abstractHide abstract

For natural language understanding, it is essential to reveal semantic relations between. words. To date, only the IS-A relation has been publicly available as a thesaurus. Toward deeper natural language understanding, we semi-automatically constructed the domain dictionary that represents the domain relation between Japanese fundamental words. Our method does not require a document collection. As a task-based evaluation of the domain dictionary, we performed blog categorization, where we assigned a domain for each word in a blog article and categorize it as the most dominant domain. In so doing, we dynamically estimated the domains of unknown words, i.e., those not listed in the domain dictionary. As a result, our blog categorization achieved the accuracy of 94.0% (564/600). Also, the domain estimation technique for unknown words achieved the accuracy of 76.6% (383/500).

View full abstract

Download PDF (2304K)
Improving Coreference Resolution Using Automatically Acquired Knowledge of Nominal Relations

RYOHEI SASANO, SADAO KUROHASHI

2008Volume 15Issue 5 Pages 99-118
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_99

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a knowledge-rich approach to Japanese coreference resolution. In Japanese, noun phrase coreference occupies a central position in coreference relations. To improve coreference resolution for such language, wide-coverage knowledge of synonyms is required. We first acquire knowledge of synonyms from large raw corpus and dictionary definition sentences, and then resolve coreference relations based on the knowledge. Furthermore, to boost the performance of coreference resolution, we integrate bridging reference resolution system that uses automatically constructed nominal case frames into coreference resolver. We evaluated our approach on news paper article and WEB corpus and confirmed that the performance of coreference resolution is improved by using automatically acquired synonyms and bridging reference resolution.

View full abstract

Download PDF (2152K)
A Comparative Study on Effective Context Selection for Distributional Similarity

Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama

2008Volume 15Issue 5 Pages 119-150
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_119

JOURNAL FREE ACCESS

Show abstractHide abstract

Distributional similarity is a widely adopted concept to capture the semantic relatedness of words based on their context in various NLP tasks. While accurate similarity calculation requires a huge number of context types and co-occurrences, the contribution to the similarity calcualtion depends on individual context types, and some of them even act as noise. To select well-performing context and alleviate the high computational cost, we propose and investigate the effectiveness of three context selection schemes: category-based, type-based, and co-occurrence based selection. Categorybased selection is a conventional, simplest selection method which limits the context types based on the syntactic category. Finer-grained, type-based selection assigns importance scores to each context type, which we make possible by proposing a novel formalization of distibutional similarity as a classification problem, and applying feature selection techniques. The finest-grained, co-occurrence based selection assigns importance scores to each co-occurrence of words and context types. We evaluate the effectiveness and the trade-off between co-occurrence data size and synonym acquisition performance. Our experiments show that, on the whole, the finest-grained, co-occurrence based selection achieves better performane, although some of the simple category-based selection show comparable performance/cost trade-off.

View full abstract

Download PDF (9940K)
TypeAny: Multilingual Text Entry System based on Language Identification

Yo EHARA, KUMIKO TANAKA-ISHII

2008Volume 15Issue 5 Pages 151-167
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_151

JOURNAL FREE ACCESS

Show abstractHide abstract

Computer users increasingly need to produce text written in multiple languages. However, typical computer systems require the user to change the text entry software each time a different language is used. This is cumbersome, especially when the languages change frequently. To solve this problem, we propose TypeAny, a novel multilingual text entry system that identifies the language of the user's key entry and automatically dispatches the input to the appropriate text entry system. This language identification is modeled as a hidden Markov model whose probability is estimated by using the PPM method. When evaluating this method, we obtained language identification accuracy of 96.7% when an appropriate language had to be chosen from among three languages. The number of control actions needed to switch languages was decreased 93% when using TypeAny rather than a conventional method.

View full abstract

Download PDF (4349K)
Japanese Dependency Parsing Using a Tournament Model

MASAKAZU IWATATE, MASAYUKI ASAHARA, YUJI MATSUMOTO

2008Volume 15Issue 5 Pages 169-185
Published: October 10, 2008
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.15.5_169

JOURNAL FREE ACCESS

Show abstractHide abstract

In Japanese dependency parsing, Kudo's relative preference-based method outperforms both deterministic and probabilistic CFG-based parsing methods. In the relative preference-based method, a log-linear model estimates selectional preferences for all candidate heads, which cannot be considered in the deterministic parsing methods. We propose an algorithm based on a tournament model, in which the selectional preferences are directly modeled by one-on-one games in a step-ladder tournament. In evaluation experiment with Kyoto Text Corpus Version 4.0, the proposed method outperforms the previous research, including the relative preference-based method.

View full abstract

Download PDF (1786K)

Register with J-STAGE for free!