自然言語処理

巻頭言（査読無）

立体言語

永田亮

2024 年 31 巻 1 号 p. 1-2
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.1

ジャーナルフリー

PDF形式でダウンロード (117K)

一般論文（査読有）

Bidirectional Transformer Reranker for Grammatical Error Correction

Ying Zhang, Hidetaka Kamigaito, Manabu Okumura

2024 年 31 巻 1 号 p. 3-46
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.3

ジャーナルフリー

抄録を表示する抄録を非表示にする

Pre-trained sequence-to-sequence (seq2seq) models have achieved state-of-the-art results in the grammatical error correction tasks. However, these models are plagued by prediction bias owing to their unidirectional decoding. Thus, this study proposed a bidirectional transformer reranker (BTR) that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-style transformer architecture but utilizes a BERT-style self-attention mechanism in the decoder to compute the probability of each target token using masked language modeling to capture bidirectional representations from the target context. To guide the reranking process, the BTR adopted negative sampling in the objective function to minimize the unlikelihood. During inference, the BTR yielded the final results after comparing the reranked top-1 results with the original ones using an acceptance threshold λ. Experimental results showed that, when reranking candidates from a pre-trained seq2seq model, the T5-base, the BTR on top of T5-base yielded scores of 65.47 and 71.27 F_0.5 on the CoNLL-14 and building educational applications 2019 (BEA) test sets, respectively, and yielded 59.52 GLEU score on the JFLEG corpus, with improvements of 0.36, 0.76, and 0.48 points compared with the original T5-base. Furthermore, when reranking candidates from T5-large, the BTR on top of T5-base improved the original T5-large by 0.26 on the BEA test set.

抄録全体を表示

PDF形式でダウンロード (911K)
クイズコンペティションの結果分析から見た日本語質問応答の到達点と課題

有山知希, 鈴木潤, 鈴木正敏, 田中涼太, 赤間怜奈, 西田京介

2024 年 31 巻 1 号 p. 47-78
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.47

ジャーナルフリー

抄録を表示する抄録を非表示にする

質問応答は，自然言語処理における重要な研究テーマの一つである．近年の深層学習技術の発達と言語資源の充実により，質問応答技術は飛躍的な発展を遂げている．しかし，これらの研究は英語を対象としたものがほとんどであり，現状，日本語での質問応答に関する研究はあまり活発には行われていない．この背景を受けて，我々は日本語での質問応答研究を促進するため，日本語のクイズを題材とした質問応答のコンペティション「AI 王」を企画し，これまでに計 3 回実施してきた．本論文では，日本語の質問応答技術における現在の到達点と課題を明らかにすることを目標として，使用したクイズ問題と提出された質問応答システム，さらに比較対象として大規模言語モデルを用いた分析を行い，その結果を報告する．

抄録全体を表示

PDF形式でダウンロード (550K)
Prefix Alignment for Training Simultaneous Machine Translation

Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

2024 年 31 巻 1 号 p. 79-104
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.79

ジャーナルフリー

抄録を表示する抄録を非表示にする

Simultaneous translation is a task that starts translation even before the speaker has finished speaking. This study focuses on prefix-to-prefix translation and proposes a method to align prefixes in a bilingual sentence pair iteratively to train a machine translation model to work with prefix-to-prefix. In the experiments, the proposed method demonstrated higher BLEU than those of the baseline methods in low latency ranges on the IWSLT simultaneous translation benchmark. However, the proposed method degraded the performance in high latency ranges in the English-to-Japanese experiments; thus, we analyzed it in length ratios and prefix boundary prediction accuracies. The obtained results suggested that the degraded performance was due to the large word order difference between English and Japanese.

抄録全体を表示

PDF形式でダウンロード (1305K)
文書レベル関係抽出における根拠認識の統合

Youmi Ma, An Wang, 岡崎直観

2024 年 31 巻 1 号 p. 105-133
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.105

ジャーナルフリー

抄録を表示する抄録を非表示にする

文書レベル関係抽出 (DocRE) は文書中のすべてのエンティティの組の関係を推定するタスクである．エンティティ組の関係推定に十分な手掛かりを含む文の集合を根拠と呼ぶ．根拠は関係抽出の性能を改善できるが，既存研究では DocRE と根拠認識を別々のタスクとしてモデル化していた．本稿では，根拠認識を関係抽出のモデルに統合する手法を提案する．具体的には，エンティティ組のエンコード過程において，根拠に高い重みを配分するように自己注意機構を誘導することにより，根拠に注目した分散表現を得る．さらに，根拠のアノテーションが付与されていないデータに根拠の疑似的な教師信号を付与し，大量の自動ラベル付けデータを活用する方法を提案する．実験結果から，提案手法は文書レベル関係抽出のベンチマーク DocRED 及び Re-DocRED において，関係抽出と根拠認識の両方で現時点の世界最良性能を達成した．

抄録全体を表示

PDF形式でダウンロード (1847K)
言語モデルを用いた漢詩文の返り点付与と書き下し文生成

王昊, 清水博文, 河原大輔

2024 年 31 巻 1 号 p. 134-154
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.134

ジャーナルフリー

抄録を表示する抄録を非表示にする

近年の自然言語処理の研究は，現代語を中心に行われ，多くのタスクで高い性能を達成している．一方，古文やそれに関連するタスクにはほとんど注意が払われてこなかった．漢文は約 2000 年前の弥生時代に中国から日本に伝えられたと推測されており，それ以降日本文学に多大な影響を与えた．現在においても大学入学共通テストの国語において漢文は 200 点の内 50 点を占めている．しかし，中国にある豊富な言語資源に比べ，日本にある漢文の書き下し文資源は非常に少ない．この問題を解決するために，本研究は漢詩文を対象とし，白文と書き下し文からなる漢文訓読データセットを構築する．そして，漢文理解において重要視される返り点付与，書き下し文生成の二つのタスクに対し，言語モデルを用いて精度向上を試みる．また，人間の評価結果と比較することで，最適な自動評価指標について議論する．データセットとコードは https://github.com/nlp-waseda/Kanbun-LM で公開している．

抄録全体を表示

PDF形式でダウンロード (1080K)
DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation

Haiyue Song, Zhuoyuan Mao, Raj Dabre, Chenhui Chu, Sadao Kurohashi

2024 年 31 巻 1 号 p. 155-188
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.155

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this study, we proposed DiverSeg to exploit diverse segmentations from multiple subword segmenters that capture the various perspectives of each word for neural machine translation. In DiverSeg, multiple segmentations are encoded using a subword lattice input, a subword-relation-aware attention mechanism integrates relations among subwords, and a cross-granularity embedding alignment objective enhances the similarity across different segmentations of a word. We conducted experiments on five datasets to evaluate the effectiveness of DiverSeg in improving machine translation quality. The results demonstrate that DiverSeg outperforms baseline methods by approximately two BLEU points. Additionally, we performed ablation studies to investigate the improvement over non-subword methods, the contribution of each component of DiverSeg, the choice of subword relations, the choice of similarity metrics in alignment loss, and combinations of segmenters.

抄録全体を表示

PDF形式でダウンロード (3838K)
A Table Question Alignment based Cell-Selection Method for Table-Text QA

Jian Wu, Yicheng Xu, Börje F. Karlsson, Manabu Okumura

2024 年 31 巻 1 号 p. 189-211
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.189

ジャーナルフリー

抄録を表示する抄録を非表示にする

Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to show their reasoning ability in retrieving answers. To bridge this gap, we propose a novel Table-alignment-based Cell-selection and Reasoning model (TACR) for hybrid text and table QA, evaluated on the HybridQA and WikiTableQuestions datasets. In evidence retrieval, we design a table-question-alignment enhanced cell-selection method to retrieve fine-grained evidence. In answer reasoning, we incorporate a QA module that treats the row containing selected cells as context. Experimental results over the HybridQA and WikiTableQuestions (WTQ) datasets show that TACR achieves state-of-the-art results on cell selection and outperforms fine-grained evidence retrieval baselines on HybridQA, while achieving competitive performance on WTQ. We also conducted a detailed analysis to demonstrate that being able to align questions to tables in the cell-selection stage can result in important gains from experiments of over 90% table row and column selection accuracy, meanwhile also improving output explainability.

抄録全体を表示

PDF形式でダウンロード (899K)
語りの傾聴において不同意を示す応答の生成

伊藤滉一朗, 村田匡輝, 大野誠寛, 松原茂樹

2024 年 31 巻 1 号 p. 212-249
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.212

ジャーナルフリー

抄録を表示する抄録を非表示にする

コミュニケーションロボットなどの会話エージェントが語りを聴く役割を担うことが期待されている．これらが聴き手として認められるには，語りを傾聴していることを語り手に伝達する機能を備える必要がある．このための明示的な手段は語りに応答することであり，傾聴を示す目的で語りに応答する発話，すなわち傾聴応答の表出が有力である．語りの傾聴では，語り手の発話を受容することが聴き手の基本的な応答方略となる．しかし，語りには，自虐や謙遜などの発話が含まれることがある．この場合，語り手の発話に同意しないことを示す応答，すなわち，不同意応答を確実に表出できることが求められる．本論文では，語りの傾聴において不同意応答を適切に生成することの実現性を示す．そのために，本研究ではまず，時間制約のない環境で語りデータに不同意応答のタイミングと表現をタグ付けする方式を定めた．作成したコーパスを用いて，不同意応答タイミングを網羅的に，不同意応答表現を安定的にタグ付けできることを検証する．続いて，事前学習済みの Transformer ベースのモデルに基づく，不同意応答タイミングの検出手法，及び，不同意応答表現への分類手法を実装し，実験により応答コーパスを用いた不同意応答生成の実現性を検証した．

抄録全体を表示

PDF形式でダウンロード (3233K)
Focused Prefix Tuning for Controllable Text Generation

Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura

2024 年 31 巻 1 号 p. 250-265
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.250

ジャーナルフリー

抄録を表示する抄録を非表示にする

In a controllable text generation dataset, unannotated attributes may provide irrelevant learning signals to models that use them for training, thereby degrading their performance. We propose focused prefix tuning(FPT) to mitigate this problem and enable control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks. In multi-attribute control tasks, FPT achieves control accuracy comparable to that of the state-of-the-art approach while maintaining the flexibility to control new attributes without retraining existing models.

抄録全体を表示

PDF形式でダウンロード (455K)

学会記事（査読無）

LLM-jp: 日本語に強い大規模言語モデルの研究開発を行う組織横断プロジェクト

河原大輔, 空閑洋平, 黒橋禎夫, 鈴木潤, 宮尾祐介

2024 年 31 巻 1 号 p. 266-279
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.266

ジャーナルフリー

PDF形式でダウンロード (1494K)
Interpreting Languages with Bits

Yiran Wang, Taro Watanabe, Masao Utiyama, Yuji Matsumoto

2024 年 31 巻 1 号 p. 280-286
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.280

ジャーナルフリー

PDF形式でダウンロード (127K)
知識グラフ補完のためのモデル予測に基づくサブサンプリング

馮昕璨

2024 年 31 巻 1 号 p. 287-293
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.287

ジャーナルフリー

PDF形式でダウンロード (683K)
類似言語における ChatGPT 使用にまつわる諸問題：マレー語とインドネシア語の事例

野元裕樹

2024 年 31 巻 1 号 p. 294-299
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.294

ジャーナルフリー

PDF形式でダウンロード (264K)
🚀 NLPコロキウム

丹羽彩奈, 横井祥, 高山隼矢, 斉藤いつみ

2024 年 31 巻 1 号 p. 300-309
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.300

ジャーナルフリー

PDF形式でダウンロード (476K)

後付記事（査読無）

編集後記・原稿執筆案内・編集スケジュール・統計情報・学会案内

2024 年 31 巻 1 号 p. 310-325
発行日: 2024年
公開日: 2024/03/15

DOIhttps://doi.org/10.5715/jnlp.31.310

ジャーナルフリー

PDF形式でダウンロード (519K)

J-STAGEへの登録はこちら（無料）