Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 25, Issue 4
Displaying 1-7 of 7 articles from this issue
Preface
Paper
  • Masayuki Asahara, Yuji Matsumoto
    2018 Volume 25 Issue 4 Pages 331-356
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    This article presents syntactic annotation for ‘Balanced Corpus of Contemporary Written Japanese’. We propose a syntactic annotation schema wherein the bunsetsu dependency and coordinate structure are separated. In addition, we propose an annotation standard to determine the attachments beyond clause boundaries. Our annotation schema and standard have issues associated the hierarchical annotation processes. Furthermore, we present the basic statistics of the annotation data.

    Download PDF (471K)
  • Yuki Tagawa, Kazutaka Shimada
    2018 Volume 25 Issue 4 Pages 357-391
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    In this study, we propose inning summarization methods to generate a simple and sophisticated summary of baseball games using play-by-play data. We focus on the two information sources of inning reports and game summaries. First, we generate a basic sentence using an inning report; further, the basic sentence is integrated with an explanatory phrase to generate inning summaries that contain expressions such as “the long-awaited first score” in game summaries. We refer to these phrases as the game-changing phrases (GPs). Readers can easily understand the situation of a game using GPs. In this study, we investigate the template- and neural-based methods of summary generation. Additionally, we evaluate the two methods and discuss their advantages and disadvantage.

    Download PDF (820K)
  • Takahiro Yamakoshi, Tomohiro Ohno, Yasuhiro Ogawa, Makoto Nakamura, Ka ...
    2018 Volume 25 Issue 4 Pages 393-419
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    We propose a method for analyzing the hierarchical coordinate structure of Japanese statutory sentences using neural language models (NLMs). Our method deterministically identifies hierarchical coordinate structures according to their rigorously defined descriptive rules. In addition, our method identifies all conjuncts in each coordinate structure using NLM-based scoring. Furthermore, it does not rely on any training data labeled with coordinate structures. An experiment demonstrates that our method drastically outperforms an existing method for Japanese statutory sentences.

    Download PDF (3603K)
  • Masahiro Kaneko, Yuya Sakaizawa, Mamoru Komachi
    2018 Volume 25 Issue 4 Pages 421-439
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    In this study, we improve grammatical error detection by learning word embeddings that consider grammaticality and error patterns. Most existing algorithms for learning word embeddings usually model only the syntactic context of words and do not consider grammatical errors specific to language learners. Therefore, we propose methods to learn word embeddings specialized for grammatical errors by considering grammaticality and grammatical error patterns. We determine grammaticality of n-gram sequence from the annotated error tags and extract grammatical error patterns for word embeddings from large-scale learner corpora. Experimental results show that a bidirectional long-short term memory model initialized by our word embeddings achieved the state-of-the-art accuracy by a large margin in an English grammatical error detection task on the First Certificate in English dataset.

    Download PDF (418K)
  • Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto
    2018 Volume 25 Issue 4 Pages 441-462
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    The task of coordinate structure analysis is to identify coordinating phrases called conjuncts. Although coordination reveals a large amount of syntactic and semantic information, it is one of the difficulties that state-of-the-art parsers cannot cope with. Some existing approaches are based only on the similarity of conjuncts while others rely heavily on syntactic information obtained by external parsers. Here, we propose a neural network model for identifying coordination boundaries. This model is composed of recurrent neural networks, which are widely used in natural language processing. Our method considers two properties of conjuncts, i.e., similarity and replaceability, and predicts the spans of the coordinate structures without using syntactic parsers. We further demonstrate that the proposed model outperforms the existing state-of-the-art methods for the Penn Treebank and GENIA corpus.

    Download PDF (1011K)
  • Kanako Komiya, Minoru Sasaki, Hiroyuki Shinnou, Manabu Okumura
    2018 Volume 25 Issue 4 Pages 463-480
    Published: September 15, 2018
    Released on J-STAGE: December 15, 2018
    JOURNAL FREE ACCESS

    In this paper, we propose domain adaptation using word embeddings for word sense disambiguation (WSD). The validity for WSD of word embeddings derived from a huge corpus such as Wikipedia had already been shown, but their validity in a domain adaptation framework has not been previously discussed. If word embeddings are valid in this new context, the impact of the document type of the corpora on WSD is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using word embeddings from the source, target and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects of the document type of the corpora from which the word embeddings are derived. We used three corpora of distinct document types and performed domain adaptation experiments using the document types as the domains. The experiments, conducted using Japanese corpora, revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpora together with a general corpora.

    Download PDF (156K)
feedback
Top