Journal of Natural Language Processing

Preface (Non Peer-Reviewed)

[title in Japanese]

[in Japanese]

2022 Volume 29 Issue 4 Pages 1050-1051
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1050

JOURNAL FREE ACCESS

Download PDF (125K)

General Paper (Peer-Reviewed)

Combining Input Augmentation and Constrained Decoding for Lexically-Constrained Neural Machine Translation

Katsuki Chousa, Makoto Morishita, Masaaki Nagata

2022 Volume 29 Issue 4 Pages 1052-1081
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1052

JOURNAL FREE ACCESS

Show abstractHide abstract

Lexically constrained machine translation is a task wherein the translation model is required to output translated sentences that contain all specified phrase constraints. In this paper, we propose a method for improving the efficiency of lexically-constrained decoding by extending the input sequence of the model. The results of experiments performed on En↔Ja indicate that the proposed method achieves higher translation accuracy with less computational cost than do the conventional methods. Furthermore, we propose a method for automatically extracting noisy lexical constraints by using the lexical constraint machine translation method. Experiments on Ja→En show that the proposed method can achieve a higher level of accuracy than do general machine translation methods.

View full abstract

Download PDF (561K)
Lexically Constrained Knowledge Distillation for Neural Machine Translation

Hideya Mino, Kazutaka Kinugawa, Hitoshi Ito, Isao Goto, Ichiro Yamada, ...

2022 Volume 29 Issue 4 Pages 1082-1105
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1082

JOURNAL FREE ACCESS

Show abstractHide abstract

Knowledge distillation is a representative approach in neural machine translation (NMT) for compressing a large model into a lightweight one. This approach first trains a strong teacher model, and then forces a more compact student model to imitate the teacher. Although the key to successful knowledge distillation is constructing a stronger teacher model, the teacher model using state-of-the-art NMT may remain inadequate owing to translation errors. Accordingly, using an inadequate teacher model severely degrades the student model due to error propagation, especially regarding words important to sentence meaning. To mitigate the degradation problem, we propose a knowledge distillation method using a lexical constraint as privileged information for NMT. The proposed method trains a teacher model with a lexical constraint, a list of words automatically extracted from a target sentence in the training data. We configure the lexical constraint according to the importance of words and the fallibility of NMT. Models trained with our proposed method result in improved translation compared with those trained with a baseline method for English↔German and English↔Japanese translation tasks under the condition without ensemble decoding and beam-search decoding.

View full abstract

Download PDF (380K)
BioVL2: An Egocentric Biochemical Video-and-Language Dataset

Taichi Nishimura, Kojiro Sakoda, Atsushi Ushiku, Atsushi Hashimoto, Na ...

2022 Volume 29 Issue 4 Pages 1106-1137
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1106

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we propose an egocentric biochemical video-and-language dataset called BioVL2 comprising eight videos for each of four experiments, with a total duration of 2.5 hours for all 32 samples. Each video corresponds to a protocol and two types of linguistic annotations are provided: (1) video-and-text alignment and (2) bounding boxes linked to objects in the protocol. As an application of the BioVL2 dataset, we consider the task of generating a protocol from an experimental video. Our experimental results show that the proposed system can generate better protocols than a weak baseline designed to output objects appearing in the video frames. The BioVL2 dataset will be released for research purposes only.

View full abstract

Download PDF (9362K)
Universal Graph based Distantly Supervised Relation Extraction

Qin Dai, Benjamin Heinzerling, Naoya Inoue, Kentaro Inui

2022 Volume 29 Issue 4 Pages 1138-1164
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1138

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper explores how the Distantly Supervised Relation Extraction (DS-RE) can benefit from the use of a Universal Graph (UG), the combination of a Knowledge Graph (KG) and a large-scale text collection. A straightforward extension of a current state-of-the-art neural model for DS-RE with a UG may lead to degradation in performance. We first report that this degradation is associated with the difficulty in learning a UG and then propose three training strategies: (1) Path Type Adaptive Pretraining, which sequentially trains the model with different types of UG paths; (2) Path Type-wise Local Loss, which is an alternative approach of the Path Type Adaptive Pretraining to generate UG path type-wise local error signals so as to prevent the reliance on a single type of UG path; and (3) Complexity Ranking Guided Attention mechanism, which restricts the attention span according to the complexity of UG paths so as to force the model to extract features not only from simple UG paths but also from complex ones. Experimental results on both biomedical and NYT10 datasets prove the robustness of our methods and achieve a new state-of-the-art result on the commonly used NYT10 dataset. The code and datasets used in this paper are available at https://github.com/baodaiqin/UGDSRE. In addition, a DS-RE toolkit developed based on this work is available at https://github.com/baodaiqin/UKG-RE.

View full abstract

Download PDF (1191K)
Versatile Annotation Guidelines for Clinical-Medical Text with an Application to Critical Lung Diseases

Shuntaro Yada, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, Sadao Kurohashi

2022 Volume 29 Issue 4 Pages 1165-1197
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1165

JOURNAL FREE ACCESS

Show abstractHide abstract

Natural language processing for medical applications (medical NLP) requires high-quality annotated corpora. In this study, we designed a versatile annotation scheme for clinical-medical text and a set of associated guidelines, which address two common subtasks used in medical NLP: named entity recognition (NER) and relation extraction (RE). The annotation scheme integrates similar existing schemes and defines clinical-medical entities and relations to encode useful information for many medical NLP applications. The guidelines aim to increase the annotation feasibility by reducing the necessity of judgement based on medical knowledge so as to enable non-medical professionals to annotate the text. We adopted a recursive discussion procedure involving NLP researchers, medical professionals, and annotators to develop the scheme and guidelines based on real annotation examples while increasing the corpus size. Further, we obtained annotated corpora comprising 3,769 medical records and radiology reports of patients with serious lung diseases. For improved efficiency, preliminary NER and RE models were created after the first half was annotated; they were subsequently applied to the second half, which was then corrected manually. This two-step annotation also increased the inter-coder agreement. Finally, a joint NER + RE model trained on our corpora showed sufficiently promising performance to suggest its practical implementation.

View full abstract

Download PDF (961K)
ITeM: Image-to-Text Matching for Multimodal Documents

Masayasu Muraoka, Naoaki Okazaki, Ryosuke Kohita, Etsuko Ishii

2022 Volume 29 Issue 4 Pages 1198-1232
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1198

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a new task called image-to-text matching (ITeM) to facilitate multimodal document understanding. ITeM requires a system to learn a plausible assignment of images to texts in a multimodal document. To study this task, we systematically construct a dataset comprising 66,947 documents with 320,200 images from Wikipedia. We evaluate two existing state-of-the-art multimodal systems on our task to assess the validity and difficulty of our task. Experimental results show that the systems greatly outperform simple baselines while their performances are still far from that of humans. Further, the proposed task does not contribute significantly to the existing multimodal tasks; however, detailed analysis suggests that the task becomes more complex when more images are present in a document and that the proposed task can offer a new capability for image-to-text understanding not achievable through existing tasks, such as multiple image consideration or image abstraction.

View full abstract

Download PDF (5999K)

Technical Report (Peer-Reviewed)

Sentiment Dictionary for Business Cycle Analysis and its Applications

Keiichi Goshima, Mototsugu Shintani, Hiroya Takamura

2022 Volume 29 Issue 4 Pages 1233-1253
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1233

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we construct a sentiment dictionary for the macroeconomic domain and present its applications. Our dictionary contains words selected by several economists from a corpus of newspaper articles on topics related to the economy. This was supplemented with additional words by using supervised learning. We use our sentiment dictionary to construct a daily business cycle index designed to capture the current state of the economy in a timely manner.

View full abstract

Download PDF (1131K)
Inputting Writing Systems with Medium Complexity: A Generalized Input Method Editor AKKHARA and Case Study on Myanmar Script

Chenchen Ding, Masao Utiyama, Eiichiro Sumita

2022 Volume 29 Issue 4 Pages 1254-1271
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1254

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, an input method editor called AKKHARA is developed to accommodate writing systems comprising several tens to hundreds of symbols. As an engineering realization, AKKHARA accepts and applies a set of rewrite rules with priorities such that the alternation, substitution, and normalization of character strings are applied alongside the keystrokes. Compared with general key-character editors, AKKHARA provides a greater flexibility for Romanization-based rule editions. Compared with the input methods developed for Chinese and Japanese, AKKHARA is lightweight and easy to maintain. As an application case of AKKHARA, this study illustrates the realization of a Romanization-based Myanmar input method using the Unicode standard. A version of AKKHARA for Microsoft Windows was released that supports Unicode characters with customizable functions for rewriting rule editions.

View full abstract

Download PDF (906K)

Society Column (Non Peer-Reviewed)

Issues Related to the Management and Use of Materials and the Publication of Research Results in the Context of Collaborative Research

Kyo Kageura

2022 Volume 29 Issue 4 Pages 1272-1278
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1272

JOURNAL FREE ACCESS

Download PDF (277K)
ACL 2022 and NAACL 2022 Participation Reports

Masahiro Kaneko

2022 Volume 29 Issue 4 Pages 1279-1283
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1279

JOURNAL FREE ACCESS

Download PDF (3016K)
Constructing Timeline Summarization as a Heterogeneous Graph Attention Network

Jingyi You

2022 Volume 29 Issue 4 Pages 1284-1289
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1284

JOURNAL FREE ACCESS

Download PDF (446K)
EASE: Entity-Aware Contrastive Learning of Sentence Embedding

Sosuke Nishikawa

2022 Volume 29 Issue 4 Pages 1290-1296
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1290

JOURNAL FREE ACCESS

Download PDF (670K)
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Ryoma Sato

2022 Volume 29 Issue 4 Pages 1297-1301
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1297

JOURNAL FREE ACCESS

Download PDF (246K)
Generating Repetitions with Appropriate Repeated Words

Toshiki Kawamoto

2022 Volume 29 Issue 4 Pages 1302-1307
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1302

JOURNAL FREE ACCESS

Download PDF (204K)
NTCIR-16: NII Testbeds and Community for Information access Research

Makoto Kato P., Takehiro Yamamoto, Noriko Kando

2022 Volume 29 Issue 4 Pages 1308-1315
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1308

JOURNAL FREE ACCESS

Download PDF (281K)
Language Resources Workshop 2022 (LRW2022)

Makoto Yamazaki, Mai Omura, Takayuki Kagomiya, Yoshiko Kawabata

2022 Volume 29 Issue 4 Pages 1316-1321
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1316

JOURNAL FREE ACCESS

Download PDF (269K)
Multimodal Dialogue Corpus Hazumi

Kazunori Komatani, Shogo Okada

2022 Volume 29 Issue 4 Pages 1322-1329
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1322

JOURNAL FREE ACCESS

Download PDF (719K)

Information (Non Peer-Reviewed)

[title in Japanese]

2022 Volume 29 Issue 4 Pages 1330-1335
Published: 2022
Released on J-STAGE: December 15, 2022

DOIhttps://doi.org/10.5715/jnlp.29.1330

JOURNAL FREE ACCESS

Download PDF (308K)

Register with J-STAGE for free!