Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 21, Issue 6
Displaying 1-6 of 6 articles from this issue
Preface
Paper
  • Katsuhito Sudoh, Shinsuke Mori, Masaaki Nagata
    2014 Volume 21 Issue 6 Pages 1107-1131
    Published: September 16, 2014
    Released on J-STAGE: March 15, 2015
    JOURNAL FREE ACCESS
    This paper proposes a novel noise-aware character alignment method for automatically extracting transliteration fragments in phrase pairs that are extracted from parallel corpora. The proposed method extends a many-to-many Bayesian character alignment method by distinguishing transliteration (signal) parts from non-transliteration (noise) parts. The model can be trained efficiently by a state-based blocked Gibbs sampling algorithm with signal and noise states. The proposed method bootstraps statistical machine transliteration using the extracted transliteration fragments to train transliteration models. In experiments using Japanese-English patent data, the proposed method was able to extract transliteration fragments with much less noise than an IBM-model-based baseline, and achieved better transliteration performance than sample-wise extraction in transliteration bootstrapping.
    Download PDF (941K)
  • Wakako Kashino, Manabu Okumura
    2014 Volume 21 Issue 6 Pages 1133-1161
    Published: December 15, 2014
    Released on J-STAGE: March 15, 2015
    JOURNAL FREE ACCESS
    A fundamental issue in compiling a Japanese dictionary is selecting lexical entries and describing the meaning(s) and use cases for the selected entries. Because some old-fashioned Japanese words continue to be used even now, modern Japanese dictionaries usually include certain old-fashioned words. However, up to now, no systematic study has investigated the selection and usage description of old-fashioned words. Therefore, here we first review five already-published modern Japanese dictionaries and clarify the characteristics and variations among them. Subsequently, we propose four categories of old-fashioned Japanese words in terms of the nature and chronological features of the text where those words appear. According to the categorization, we analyze the use cases of the old-fashioned words in the “Balanced Corpus of Contemporary Written Japanese.” Finally, we discuss a systematic methodology of lexical description for such entries, with a typical example.
    Download PDF (638K)
  • Gongye Jin, Daisuke Kawahara, Sadao Kurohashi
    2014 Volume 21 Issue 6 Pages 1163-1182
    Published: December 15, 2014
    Released on J-STAGE: March 15, 2015
    JOURNAL FREE ACCESS
    Many knowledge acquisition tasks are tightly dependent on fundamental analysis technologies, such as part of speech (POS) tagging and parsing. Dependency parsing, in particular, has been widely employed for the acquisition of knowledge related to predicate-argument structures. For such tasks, the dependency parsing performance can determine quality of acquired knowledge, regardless of target languages. Therefore, reducing dependency parsing errors and selecting high quality dependencies is of primary importance. In this study, we present a language-independent approach for automatically selecting high quality dependencies from automatic parses. By considering several aspects that affect the accuracy of dependency parsing, we created a set of features for supervised classification of reliable dependencies. Experimental results on seven languages show that our approach can effectively select high quality dependencies from dependency parses.
    Download PDF (3005K)
  • Ryohei Sasano, Sadao Kurohashi, Manabu Okumura
    2014 Volume 21 Issue 6 Pages 1183-1205
    Published: December 15, 2014
    Released on J-STAGE: March 15, 2015
    JOURNAL FREE ACCESS
    This paper presents a simple but effective approach to unknown word processing in Japanese morphological analysis, which handles 1) unknown words that are derived from words in a pre-defined lexicon and 2) unknown onomatopoeias. Our approach leverages derivation rules and onomatopoeia patterns, and correctly recognizes certain types of unknown words. Experiments revealed that our approach recognized about 4,500 unknown words in 100,000 Web sentences with only roughly 80 harmful side effects and a 6% loss in speed.
    Download PDF (479K)
  • Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi, Manabu Okumura
    2014 Volume 21 Issue 6 Pages 1207-1233
    Published: December 15, 2014
    Released on J-STAGE: March 15, 2015
    JOURNAL FREE ACCESS
    We propose a method for automatically acquiring knowledge about case alternations between the passive/causative and active voices. Our method leverages large lexical case frames obtained from a large Web corpus, and several alternation patterns. We then use the acquired knowledge to a case alternation task and show its usefulness.
    Download PDF (637K)
feedback
Top