自然言語処理

巻頭言（査読無）

意味から逃げるな

賀沢秀人

2023 年30 巻1 号 p. 1-3
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.1

ジャーナルフリー

PDF形式でダウンロード (134K)

一般論文（査読有）

国語研長単位に基づく日本語 Universal Dependencies

大村舞, 若狭絢 , 浅原正幸

2023 年30 巻1 号 p. 4-29
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.4

ジャーナルフリー

抄録を表示する抄録を非表示にする

Universal Dependencies (UD) は言語横断的に単語の依存構造に基づくツリーバンクを構築するプロジェクトである．全言語で統一した基準により，品詞・依存構造アノテーションデータの構築が 100 言語以上の言語について進められている．分かち書きをしない言語においては，基本単位となる構文的な語 (syntactic word) を規定する必要がある．従前の日本語の UD データは，形態論に基づく単位である国語研短単位を採用していた．今回，我々は新たに構文的な語に近い単語単位である国語研長単位に基づく日本語 UD である UD_Japanese-GSDLUW, UD_Japanese-PUDLUW，UD_Japanese-BCCWJLUW を構築したので報告する．

抄録全体を表示

PDF形式でダウンロード (1057K)
Pipeline Signed Japanese Translation Using PBSMT and Transformer in a Low-Resource Setting

Ken Yano, Akira Utsumi

2023 年30 巻1 号 p. 30-62
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.30

ジャーナルフリー

抄録を表示する抄録を非表示にする

We propose a novel pipeline method for translating signed Japanese sentences into written Japanese. Sign languages often suppress functional words such as particles, and most words are not morphologically inflected as they are in spoken languages. Our method explicitly compares and contrasts the two languages and divides the translation process into two tasks: first, it translates glosses into lemmatized Japanese words or phrases, followed by complementing particles and conjugating predicates such as verbs, auxiliary verbs, and adjectives. Our method is especially effective when the size of the parallel corpus is very limited and costly to obtain, but there are plenty of monolingual corpora for the target. Specifically, our method first uses phrase-based statistical machine translation (PBSMT) to map sign glosses to corresponding Japanese words or phrases, and then employs a transformer-based neural machine translation (NMT) model trained with a monolingual corpus to refine the output in the first translation. Experimental results show that our pipeline method outperforms direct PBSMT and competitive NMT models with data augmentation, including back-translation and transfer learning in a low-resource setting with a corpus size on the order of 10⁴ words.

抄録全体を表示

PDF形式でダウンロード (860K)
JGLUE: 日本語言語理解ベンチマーク

栗原健太郎, 河原大輔, 柴田知秀

2023 年30 巻1 号 p. 63-87
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.63

ジャーナルフリー

抄録を表示する抄録を非表示にする

高性能な言語理解モデルを開発するためには，言語理解の能力を様々な観点から評価し分析するためのベンチマークが必要である．英語においては，GLUE (Wang et al. 2018) が先駆けとして構築されており，中国語版のCLUE (Xu et al. 2020) やフランス語版のFLUE (Le et al. 2020) など，各言語におけるベンチマーク構築も進んでいるが，日本語においては GLUE のようなベンチマークは存在せず，日本語自然言語処理において大きな問題となっている．本研究では，一般的な日本語言語理解能力を測ることを目的として，翻訳を介することなく，日本語で一から言語理解ベンチマーク JGLUE を構築する．JGLUE は文章分類，文ペア分類，QA の 3 種類のタスクから構成される．本ベンチマークによって日本語における言語理解研究が活性化することを期待する．

抄録全体を表示

PDF形式でダウンロード (854K)
国会会議録のための音声から書き言葉への end-to-end 変換

三村正人, 河原達也

2023 年30 巻1 号 p. 88-124
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.88

ジャーナルフリー

抄録を表示する抄録を非表示にする

従来の音声認識システムは，入力音声に現れるすべての単語を忠実に再現するように設計されているため，認識精度が高いときでも，人間にとって読みやすい文を出力するとは限らない．これに対して，本研究では，フィラーや言い誤りの削除，句読点や脱落した助詞の挿入，また口語的な表現の修正など，適宜必要な編集を行いながら，音声から直接可読性の高い書き言葉スタイルの文を出力する新しい音声認識のアプローチについて述べる．我々はこのアプローチを単一のニューラルネットワークを用いた音声から書き言葉への end-to-end 変換として定式化する．また，音声に忠実な書き起こしを疑似的に復元し，end-to-end モデルの学習を補助する手法と，句読点位置を手がかりとした新しい音声区分化手法も併せて提案する．700 時間の衆議院審議音声を用いた評価実験により，提案手法は音声認識とテキストベースの話し言葉スタイル変換を組み合わせたカスケード型のアプローチより高精度かつ高速に書き言葉を生成できることを示す．さらに，国会会議録作成時に編集者が行う修正作業を分類・整理し，これらについて提案システムの達成度と誤り傾向の分析を行う．

抄録全体を表示

PDF形式でダウンロード (1327K)
定義文を用いた文埋め込み構成法

塚越駿, 笹野遼平, 武田浩一

2023 年30 巻1 号 p. 125-155
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.125

ジャーナルフリー

抄録を表示する抄録を非表示にする

自然言語文をベクトルとして表現する文埋め込みは，深層学習を用いた自然言語処理の基礎技術として盛んに研究されており，特に自然言語推論 (Natural Language Inference; NLI) タスクに基づく文埋め込み手法が成功を収めている．しかし，これらの手法は大規模な NLI データセットを必要とすることから，そのような NLI データが整備された言語以外については高品質な文埋め込みの構築が期待できないという問題がある．本研究ではこの問題を解決するため，NLI データと比べて多くの言語において整備が行われている言語資源である辞書に着目し，辞書の定義文を用いた文埋め込み手法を提案する．また，標準的なベンチマークを用いた評価実験を通し，提案手法は既存の NLI タスクに基づく文埋め込み手法と同等の性能を実現すること，評価タスクの性質や評価データの抽出方法により性能に差異が見られること，これら2手法を統合することでより高い性能を実現できることを示す．

抄録全体を表示

PDF形式でダウンロード (2322K)
A Simple and Effective Method for Injecting Word-level Information into Character-aware Neural Language Models

Yukun Feng, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

2023 年30 巻1 号 p. 156-183
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.156

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this study, we propose a simple and effective method to inject word-level information into character-aware neural language models. Unlike previous approaches, which typically inject word-level information as input to a long short-term memory (LSTM) network, we inject such information into the softmax function. The resultant model can be considered a combination of a character-aware language model and a simple word-level language model. Our injection method can be used in conjunction with previous methods. The results of experiments on 14 typologically diverse languages are provided to empirically show that our injection method performed better than previous methods that inject word-level information at the input, including a gating mechanism, averaging, and concatenation of word vectors. Our method can also be used together with previous injection methods. Finally, we provide a comprehensive comparison with previous injection methods and analyze the effectiveness of word-level information in character-aware language models and the properties of our injection method in detail.

抄録全体を表示

PDF形式でダウンロード (277K)
Joint Learning-based Heterogeneous Graph Attention Network for Timeline Summarization

Jingyi You, Dongyuan Li, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu ...

2023 年30 巻1 号 p. 184-214
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.184

ジャーナルフリー

抄録を表示する抄録を非表示にする

Timeline summarization (TLS) is defined as a task for summarizing events in chronological order, which gives readers a comprehensive understanding of an evolutionary story. Previous studies on the timeline summarization (TLS) task ignored the information interaction between sentences and dates, and adopted pre-defined unlearnable representations for them, which significantly degrade the performance. They also considered date selection and event detection as two independent tasks, which makes it impossible to integrate their advantages and obtain a globally optimal summary. In this paper, we present a {joint learning-based heterogeneous graph attention network for TLS (HeterTls), in which date selection and event detection are combined into a unified framework to improve the extraction accuracy and remove redundant sentences simultaneously. Our heterogeneous graph involves multiple types of nodes, the representations of which are iteratively learned across the heterogeneous graph attention layer. We evaluated our model on four datasets, and found that it significantly outperformed the current state-of-the-art baselines with regard to ROUGE scores and date selection metrics.

抄録全体を表示

PDF形式でダウンロード (4851K)

学会記事（査読無）

NLP 若手の会 (YANS) 第 17 回シンポジウム

萩行正嗣

2023 年30 巻1 号 p. 215-220
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.215

ジャーナルフリー

PDF形式でダウンロード (253K)
書籍紹介『IT Text 自然言語処理の基礎』

小田悠介

2023 年30 巻1 号 p. 221-225
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.221

ジャーナルフリー

PDF形式でダウンロード (496K)
学生・若手研究者のための BERT ワークショップ

中山功太, 渋木英潔, 関根聡

2023 年30 巻1 号 p. 226-233
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.226

ジャーナルフリー

PDF形式でダウンロード (1263K)
人工知能学会言語・音声理解と対話処理研究会主催「対話システムシンポジウム」

徳久良子

2023 年30 巻1 号 p. 234-242
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.234

ジャーナルフリー

PDF形式でダウンロード (4694K)
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama

2023 年30 巻1 号 p. 243-249
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.243

ジャーナルフリー

PDF形式でダウンロード (273K)
解説：Factual Accuracy is not Enough: Planning Consistent Description Order for Radiology Report Generation

西埜徹

2023 年30 巻1 号 p. 250-255
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.250

ジャーナルフリー

PDF形式でダウンロード (555K)

後付記事（査読無）

編集後記・原稿執筆案内・編集スケジュール・統計情報・学会案内

2023 年30 巻1 号 p. 256-272
発行日: 2023年
公開日: 2023/03/15

DOIhttps://doi.org/10.5715/jnlp.30.256

ジャーナルフリー

PDF形式でダウンロード (552K)

J-STAGEへの登録はこちら（無料）