Journal of Natural Language Processing

Preface (Non Peer-Reviewed)

[title in Japanese]

[in Japanese]

2023 Volume 30 Issue 1 Pages 1-3
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.1

JOURNAL FREE ACCESS

Download PDF (134K)

General Paper (Peer-Reviewed)

Universal Dependencies for Japanese Based on Long-Unit Words by NINJAL

Mai Omura, Aya Wakasa, Masayuki Asahara

2023 Volume 30 Issue 1 Pages 4-29
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.4

JOURNAL FREE ACCESS

Show abstractHide abstract

Universal dependencies (UD) are part of an international project that aims to construct cross-lingual dependency treebanks. The consistent annotation standards of grammar (parts of speech, morphological features, and syntactic dependencies) are defined across different languages and compiled as treebanks of more than 100 languages. The languages written without word delimitation must define the word units of their syntactic words on the UD guideline. The preceding UD Japanese resources are based on the short-unit words by NINJAL, which is defined by their lexicon-based morphology. This study introduces UD Japanese resources UD_Japanese_BCCWJ-GSDLUW, UD_Japanese_PUDLUW, and UD_Japanese_BCCWJLUW based on the long-unit words by NINJAL, which are more suitable as syntactic words than NINJAL’s short-unit words in Japanese.

View full abstract

Download PDF (1057K)
Pipeline Signed Japanese Translation Using PBSMT and Transformer in a Low-Resource Setting

Ken Yano, Akira Utsumi

2023 Volume 30 Issue 1 Pages 30-62
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.30

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel pipeline method for translating signed Japanese sentences into written Japanese. Sign languages often suppress functional words such as particles, and most words are not morphologically inflected as they are in spoken languages. Our method explicitly compares and contrasts the two languages and divides the translation process into two tasks: first, it translates glosses into lemmatized Japanese words or phrases, followed by complementing particles and conjugating predicates such as verbs, auxiliary verbs, and adjectives. Our method is especially effective when the size of the parallel corpus is very limited and costly to obtain, but there are plenty of monolingual corpora for the target. Specifically, our method first uses phrase-based statistical machine translation (PBSMT) to map sign glosses to corresponding Japanese words or phrases, and then employs a transformer-based neural machine translation (NMT) model trained with a monolingual corpus to refine the output in the first translation. Experimental results show that our pipeline method outperforms direct PBSMT and competitive NMT models with data augmentation, including back-translation and transfer learning in a low-resource setting with a corpus size on the order of 10⁴ words.

View full abstract

Download PDF (860K)
JGLUE: Japanese General Language Understanding Evaluation

Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata

2023 Volume 30 Issue 1 Pages 63-87
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.63

JOURNAL FREE ACCESS

Show abstractHide abstract

To develop high-performance natural language understanding (NLU) models, it is necessary to have a benchmark to evaluate and analyze NLU ability from various perspectives. The English NLU benchmark, GLUE (Wang et al. 2018), has been the forerunner, and benchmarks for languages other than English have been constructed, such as CLUE (Xu et al. 2020) for Chinese and FLUE (Le et al. 2020) for French. However, there is no such benchmark for Japanese, and this is a serious problem in Japanese NLP. We build a Japanese NLU benchmark, JGLUE, from scratch without translation to measure the general NLU ability in Japanese. JGLUE consists of three kinds of tasks: text classification, sentence pair classification, and QA. We hope that JGLUE will facilitate NLU research in Japanese.

View full abstract

Download PDF (854K)
End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings

Masato Mimura, Tatsuya Kawahara

2023 Volume 30 Issue 1 Pages 88-124
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.88

JOURNAL FREE ACCESS

Show abstractHide abstract

Because conventional automatic speech recognition (ASR) systems are designed to faithfully reproduce utterances word-by-word, their outputs are not necessarily easy to read even when they have few speech recognition errors. To address this issue, we propose a novel ASR approach that outputs readable and clean text directly from speech by removing fillers and disfluent regeons, substituting colloquial expressions with formal ones, insertintg punctuation and recovering omitted particles, and performing other types of appropriate corrections. We formalize this approach as an end-to-end generation of written-style text from speech using a single neural network. We also propose a method to guide the training of this end-to-end model using automatically generated faithful transcripts, as well as a novel speech segmentation strategy based on online punctuation detection. An evaluation using 700 hours of Japanese Parliamentary speech data demonstrates that the proposed direct approach successfully generates clean transcripts suitable for human consumption more accurately at a faster decoding speed than the conventional cascade approach. We also provide an in-depth analysis on the types of edits performed by professional human editors to create the official written records of Japanese Parliamentary meetings, and evaluate the level of achievement of the proposed system in terms of each of the edit types.

View full abstract

Download PDF (1327K)
Sentence Embeddings using Definition Sentences

Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda

2023 Volume 30 Issue 1 Pages 125-155
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.125

JOURNAL FREE ACCESS

Show abstractHide abstract

Sentence embeddings, which represent sentences as dense vectors, have been actively studied as a fundamental technique for natural language processing using deep learning. In particular, sentence embedding methods based on Natural Language Inference (NLI) tasks have been successful. However, these methods heavily rely on large NLI datasets and thus cannot be expected to produce adequate sentence embedding for languages for which large NLI datasets are not available. In this paper, we propose a sentence embedding method using definition sentences from a word dictionary, which is available in many languages. Experimental results on standard benchmarks demonstrate that our method performs comparably to NLI-based methods. Furthermore, we demonstrate differences in performance depending on the properties of the evaluation task and data, and even higher performance can be achieved by combining the two methods.

View full abstract

Download PDF (2322K)
A Simple and Effective Method for Injecting Word-level Information into Character-aware Neural Language Models

Yukun Feng, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

2023 Volume 30 Issue 1 Pages 156-183
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.156

JOURNAL FREE ACCESS

Show abstractHide abstract

In this study, we propose a simple and effective method to inject word-level information into character-aware neural language models. Unlike previous approaches, which typically inject word-level information as input to a long short-term memory (LSTM) network, we inject such information into the softmax function. The resultant model can be considered a combination of a character-aware language model and a simple word-level language model. Our injection method can be used in conjunction with previous methods. The results of experiments on 14 typologically diverse languages are provided to empirically show that our injection method performed better than previous methods that inject word-level information at the input, including a gating mechanism, averaging, and concatenation of word vectors. Our method can also be used together with previous injection methods. Finally, we provide a comprehensive comparison with previous injection methods and analyze the effectiveness of word-level information in character-aware language models and the properties of our injection method in detail.

View full abstract

Download PDF (277K)
Joint Learning-based Heterogeneous Graph Attention Network for Timeline Summarization

Jingyi You, Dongyuan Li, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu ...

2023 Volume 30 Issue 1 Pages 184-214
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.184

JOURNAL FREE ACCESS

Show abstractHide abstract

Timeline summarization (TLS) is defined as a task for summarizing events in chronological order, which gives readers a comprehensive understanding of an evolutionary story. Previous studies on the timeline summarization (TLS) task ignored the information interaction between sentences and dates, and adopted pre-defined unlearnable representations for them, which significantly degrade the performance. They also considered date selection and event detection as two independent tasks, which makes it impossible to integrate their advantages and obtain a globally optimal summary. In this paper, we present a {joint learning-based heterogeneous graph attention network for TLS (HeterTls), in which date selection and event detection are combined into a unified framework to improve the extraction accuracy and remove redundant sentences simultaneously. Our heterogeneous graph involves multiple types of nodes, the representations of which are iteratively learned across the heterogeneous graph attention layer. We evaluated our model on four datasets, and found that it significantly outperformed the current state-of-the-art baselines with regard to ROUGE scores and date selection metrics.

View full abstract

Download PDF (4851K)

Society Column (Non Peer-Reviewed)

The 17th Symposium of Young Researcher Association for NLP Studies (YANS 2022)

Masatsugu Hangyo

2023 Volume 30 Issue 1 Pages 215-220
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.215

JOURNAL FREE ACCESS

Download PDF (253K)
Book review: “IT Text 自然言語処理の基礎”

Yusuke Oda

2023 Volume 30 Issue 1 Pages 221-225
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.221

JOURNAL FREE ACCESS

Download PDF (496K)
BERT Workshop for Students and Young Researchers

Kouta Nakayama, Hideyuki Shibuki, Satoshi Sekine

2023 Volume 30 Issue 1 Pages 226-233
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.226

JOURNAL FREE ACCESS

Download PDF (1263K)
JSAI: The 13th Dialogue System Symposium

Ryoko Tokuhisa

2023 Volume 30 Issue 1 Pages 234-242
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.234

JOURNAL FREE ACCESS

Download PDF (4694K)
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama

2023 Volume 30 Issue 1 Pages 243-249
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.243

JOURNAL FREE ACCESS

Download PDF (273K)
Factual Accuracy is not Enough: Planning Consistent Description Order for Radiology Report Generation

Toru Nishino

2023 Volume 30 Issue 1 Pages 250-255
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.250

JOURNAL FREE ACCESS

Download PDF (555K)

Information (Non Peer-Reviewed)

[title in Japanese]

2023 Volume 30 Issue 1 Pages 256-272
Published: 2023
Released on J-STAGE: March 15, 2023

DOIhttps://doi.org/10.5715/jnlp.30.256

JOURNAL FREE ACCESS

Download PDF (552K)

Register with J-STAGE for free!