Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2018 Volume 25 Issue 4 Pages 329-330
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.329

JOURNAL FREE ACCESS

Download PDF (123K)

Paper

Bunsetsu-based Dependency Relation and Coordinate Structure Annotation on ‘Balanced Corpus of Contemporary Written Japanese’

Masayuki Asahara, Yuji Matsumoto

2018 Volume 25 Issue 4 Pages 331-356
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.331

JOURNAL FREE ACCESS

Show abstractHide abstract

This article presents syntactic annotation for ‘Balanced Corpus of Contemporary Written Japanese’. We propose a syntactic annotation schema wherein the bunsetsu dependency and coordinate structure are separated. In addition, we propose an annotation standard to determine the attachments beyond clause boundaries. Our annotation schema and standard have issues associated the hierarchical annotation processes. Furthermore, we present the basic statistics of the annotation data.

View full abstract

Download PDF (471K)
Comparison of Template- and Neural-based Methods for Sports Summary Generation

Yuki Tagawa, Kazutaka Shimada

2018 Volume 25 Issue 4 Pages 357-391
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.357

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we propose inning summarization methods to generate a simple and sophisticated summary of baseball games using play-by-play data. We focus on the two information sources of inning reports and game summaries. First, we generate a basic sentence using an inning report; further, the basic sentence is integrated with an explanatory phrase to generate inning summaries that contain expressions such as “the long-awaited first score” in game summaries. We refer to these phrases as the game-changing phrases (GPs). Readers can easily understand the situation of a game using GPs. In this study, we investigate the template- and neural-based methods of summary generation. Additionally, we evaluate the two methods and discuss their advantages and disadvantage.

View full abstract

Download PDF (820K)
Hierarchical Coordinate Structure Analysis for Japanese Statutory Sentences Using Neural Language Models

Takahiro Yamakoshi, Tomohiro Ohno, Yasuhiro Ogawa, Makoto Nakamura, Ka ...

2018 Volume 25 Issue 4 Pages 393-419
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.393

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method for analyzing the hierarchical coordinate structure of Japanese statutory sentences using neural language models (NLMs). Our method deterministically identifies hierarchical coordinate structures according to their rigorously defined descriptive rules. In addition, our method identifies all conjuncts in each coordinate structure using NLM-based scoring. Furthermore, it does not rely on any training data labeled with coordinate structures. An experiment demonstrates that our method drastically outperforms an existing method for Japanese statutory sentences.

View full abstract

Download PDF (3603K)
Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings

Masahiro Kaneko, Yuya Sakaizawa, Mamoru Komachi

2018 Volume 25 Issue 4 Pages 421-439
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.421

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we improve grammatical error detection by learning word embeddings that consider grammaticality and error patterns. Most existing algorithms for learning word embeddings usually model only the syntactic context of words and do not consider grammatical errors specific to language learners. Therefore, we propose methods to learn word embeddings specialized for grammatical errors by considering grammaticality and grammatical error patterns. We determine grammaticality of n-gram sequence from the annotated error tags and extract grammatical error patterns for word embeddings from large-scale learner corpora. Experimental results show that a bidirectional long-short term memory model initialized by our word embeddings achieved the state-of-the-art accuracy by a large margin in an English grammatical error detection task on the First Certificate in English dataset.

View full abstract

Download PDF (418K)
Similarity and Replaceability Feature Representations of Word Sequences for Identifying Coordination Boundaries

Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto

2018 Volume 25 Issue 4 Pages 441-462
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.441

JOURNAL FREE ACCESS

Show abstractHide abstract

The task of coordinate structure analysis is to identify coordinating phrases called conjuncts. Although coordination reveals a large amount of syntactic and semantic information, it is one of the difficulties that state-of-the-art parsers cannot cope with. Some existing approaches are based only on the similarity of conjuncts while others rely heavily on syntactic information obtained by external parsers. Here, we propose a neural network model for identifying coordination boundaries. This model is composed of recurrent neural networks, which are widely used in natural language processing. Our method considers two properties of conjuncts, i.e., similarity and replaceability, and predicts the spans of the coordinate structures without using syntactic parsers. We further demonstrate that the proposed model outperforms the existing state-of-the-art methods for the Penn Treebank and GENIA corpus.

View full abstract

Download PDF (1011K)
Domain Adaptation using Word Embeddings for Word Sense Disambiguation

Kanako Komiya, Minoru Sasaki, Hiroyuki Shinnou, Manabu Okumura

2018 Volume 25 Issue 4 Pages 463-480
Published: September 15, 2018
Released on J-STAGE: December 15, 2018

DOIhttps://doi.org/10.5715/jnlp.25.463

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose domain adaptation using word embeddings for word sense disambiguation (WSD). The validity for WSD of word embeddings derived from a huge corpus such as Wikipedia had already been shown, but their validity in a domain adaptation framework has not been previously discussed. If word embeddings are valid in this new context, the impact of the document type of the corpora on WSD is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using word embeddings from the source, target and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects of the document type of the corpora from which the word embeddings are derived. We used three corpora of distinct document types and performed domain adaptation experiments using the document types as the domains. The experiments, conducted using Japanese corpora, revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpora together with a general corpora.

View full abstract

Download PDF (156K)

Register with J-STAGE for free!