Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 27, Issue 4
Displaying 1-16 of 16 articles from this issue
Preface
General Paper
  • Hiroki Teranishi, Hiroyuki Shindo, Taro Watanabe, Yuji Matsumoto
    2020 Volume 27 Issue 4 Pages 719-752
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    The task of coordinate structure analysis is to identify coordinated phrases linked by a coordinator. Coordination is a major source of ambiguities in natural language and confuses even state-of-the-art syntactic parsers. In this paper, we generalize a scoring function that takes a pair of spans with a coordinator and returns a higher score for the two spans to be coordinated. During inference, our system employs this function with the CKY parsing algorithm and produces coordinate structures for a given sentence. To obtain such a function, we decompose the task into three independent subtasks and build the function based on three different neural networks for the tasks. The experimental results on English corpora demonstrate that our model achieves state-of-the-art results, ensuring that the global structure of coordination is consistent.

    Download PDF (1076K)
  • Tatsuki Kuribayashi, Hiroki Ouchi, Naoya Inoue, Jun Suzuki, Paul Reis ...
    2020 Volume 27 Issue 4 Pages 753-779
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Argumentation Structure Parsing (ASP) is the task of predicting the roles of argumentative units (e.g., claim, premise) and the relations between the units (e.g., support, attack) in an argumentative text. ASP has received a great deal of attention due to its usefulness for applications such as automatic assessment of argumentative texts. As textual spans (i.e., argumentative units) are basic units of ASP, it is important to explore an effective design for representing them. Inspired by the current span representation design in other natural language processing tasks, we propose a method to obtain effective span representations of argumentative units in ASP. Our proposed method leverages multiple levels of global contextual information, such as argumentative markers in surrounding contexts, for obtaining each span representation. We show that using our span representation improves performance on several benchmark datasets—especially when parsing complex argumentative texts, which have been difficult to parse with existing methods. Furthermore, we report the effectiveness of our span representations when using word representations obtained from existing, powerful language models such as BERT.

    Download PDF (878K)
  • Kaori Abe, Yuichiroh Matsubayashi, Naoaki Okazaki, Kentaro Inui
    2020 Volume 27 Issue 4 Pages 781-800
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    We present a multi-dialect neural machine translation (NMT) model tailored to Japanese. Although the surface forms of Japanese dialects differ from those of standard Japanese, most of the dialects have common fundamental properties, such as word order, and some also use numerous same phonetic correspondence rules. To take advantage of these properties, we integrate multilingual, syllable-level, and fixed-order translation techniques into a general NMT model. Our experimental results demonstrate that this model can outperform a baseline dialect translation model. In addition, we show that visualizing the dialect embeddings learned by the model can facilitate the geographical and typological analyses of the dialects.

    Download PDF (1032K)
  • Tomoyuki Kajiwara, Daiki Nishihara, Tomonori Kodaira, Mamoru Komachi
    2020 Volume 27 Issue 4 Pages 801-824
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    This study introduces three language resources for Japanese lexical simplification: 1) an evaluation dataset, 2) lexica, and 3) a toolkit that can be used to develop and benchmark Japanese lexical simplification systems. The word complexity lexicon adopted in this study was automatically expanded using a classifier trained on a small word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, simplified word pairs were extracted from a large-scale synonym lexicon, and a simplified synonym lexicon that is useful for lexical simplification was developed. In addition, a Python library, which implements automatic evaluation and key methods in each subtask to ease the construction process of a lexical simplification pipeline, was developed. The experimental results on the developed evaluation dataset revealed that the proposed method, which is based on the developed lexicon, achieves the highest performance of Japanese lexical simplification.

    Download PDF (533K)
  • Tatsuya ISHIGAKI, Kazuya Machida, Hayato Kobayashi, Hiroya TAKAMURA, M ...
    2020 Volume 27 Issue 4 Pages 825-852
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    We treat extractive summarization for questions. Neural extractive summarizers often require much labeled training data. Obtaining such labels is difficult, especially for user-generated content, such as questions posted on community question answering services. In this paper, we propose semi-supervised extractive summarizers for such questions that exploit question-answer pairs to alleviate the problem of insufficient labeled data. To this end, we propose several learning methods, namely pretraining, multi-task learning, distant supervision, and sampling methods, to examine how to effectively use such unlabeled paired data. Experimental results show that multi-task training performs well with an appropriate sampling method or distant supervision, especially when the labeled data is small.

    Download PDF (815K)
  • Sachi Kato, Rei Kikuchi, Masayuki Asahara
    2020 Volume 27 Issue 4 Pages 853-887
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    A figurative expression database was constructed based on the Balanced Corpus of Contemporary Written Japanese (BCCWJ), with the goal of understanding actual usage of figurative expressions in Japanese. Using the three hundred fifty nine types of figurative expression indicators listed in ‘A Stylistic Study of the Figurative’ (Hiyuhyogen-no Riron-to Bunrui) as clues for metaphor indicator elements, candidates were selected based on synonym examples confirmed in the ‘Word List by Semantic Principles’, and a total of eight hundred twenty two expressions were manually extracted from one million two hundred ninety thousand sixty words found in six registers of core data (Yahoo! Answers, white papers; Yahoo! Blog, books, magazines, and newspapers). In addition to the vehicle, topic, and Word List by Semantic Principles label of each metaphor example, type categories such as personification, objectification, biomimicry, and substantiation were defined. Examples were also classified into categories such as synecdoche, metonymy, contextual metaphor, and idiomatic expression. Although the work above was carried out by linguists, ratings were also assigned to each example for five aspects (figurativeness, novelty, comprehensibility, personification, and substantiation) based on evaluations by twenty two to seventy seven non-experts (average: thirty three) to evaluate how these figurative expressions were perceived. The usage trends for each of these figurative expression indicators in contemporary Japanese were determined based on their relative frequency in each register and distribution of their rating values.

    Download PDF (1645K)
  • Yudai Kishimoto, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi
    2020 Volume 27 Issue 4 Pages 889-931
    Published: December 15, 2020
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Although discourse parsing is fundamental to natural language processing, limited research has been conducted on corpus-based discourse parsing in Japanese. Herein, we construct a Japanese corpus annotated with discourse units, discourse connectives, and discourse relations. We propose four strategies of easily and rapidly developing a corpus: (1) selecting web documents with their first three sentences as the target documents, (2) automatically annotating discourse units and connectives, (3) designing a discourse relation tagset consisting of seven classes organized into a two-level hierarchy, and (4) annotating discourse relations through two types of annotators, namely experts and crowd workers. We report that there is significant room for improvement in data annotation performed by crowd workers. Based on this corpus, we develop a Japanese discourse parser. Experimental results show that the proposed parser outperforms previously developed models. We also demonstrate that the automatic recognizer of discourse connectives can be used as a high-quality parser for explicit discourse relations. We implement a recognizer of discourse units and discourse connectives in KNP. We also make the corpus publicly available.

    Download PDF (1081K)
Society Column
feedback
Top