自然言語処理

巻頭言

R. B. について

二宮崇

2021 年28 巻4 号 p. 936-937
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.936

ジャーナルフリー

PDF形式でダウンロード (165K)

一般論文

Probing Simple Factoid Question Answering Based on Linguistic Knowledge

Namgi Han, Hiroshi Noji, Katsuhiko Hayashi, Hiroya Takamura, Yusuke Mi ...

2021 年28 巻4 号 p. 938-964
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.938

ジャーナルフリー

抄録を表示する抄録を非表示にする

Recent studies have indicated that existing systems for simple factoid question answering over a knowledge base are not robust for different datasets. We evaluated the ability of a pretrained language model, BERT, to perform this task on four datasets, Free917, FreebaseQA, SimpleQuestions, and WebQSP, and found that, like other existing systems, the existing BERT-based system also can not solve them robustly. To investigate the reason for this problem, we employ a statistical method, partial least squares path modeling (PLSPM), with 24 BERT models and two probing tasks, SentEval and GLUE. Our results reveal that the existing BERT-based system tends to depend on the surface and syntactic features of each dataset, and it disturbs the generality and robustness of the system performance. We also discuss the reason for this phenomenon by considering the features of each dataset and the method that was used to evaluate the simple factoid question answering task.

抄録全体を表示

PDF形式でダウンロード (295K)
Improved Decomposition Strategy for Joint Entity and Relation Extraction

Van-Hien Tran, Van-Thuy Phi, Akihiko Kato, Hiroyuki Shindo, Taro Watan ...

2021 年28 巻4 号 p. 965-994
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.965

ジャーナルフリー

抄録を表示する抄録を非表示にする

The joint entity and relation extraction task detects entity pairs along with their relations to extract relational triplets. A recent study (Yu et al. 2020) proposed a novel decomposition strategy that splits the task into two interrelated subtasks: detection of the head-entity (HE) and identification of the corresponding tail-entity and relation (TER) for each extracted head-entity. However, this strategy suffers from two major problems. First, if the HE detection task fails to find a valid head-entity, the model will then miss all related triplets containing this head-entity in the head role. Second, as Yu et al. (2020) stated, their model cannot solve the entity pair overlap (EPO) problem. For a given head-entity, the TER extraction task predicts only a single relation between the head-entity and a tail-entity, even though this entity pair can hold multiple relations. To address these problems, we propose an improved decomposition strategy that considers each extracted entity in two roles (head and tail) and allows a model to predict multiple relations (if any) of an entity pair. In addition, a corresponding model framework is presented to deploy our new decomposition strategy. Experimental results showed that our approach significantly outperformed the previous approach of Yu et al. (2020) and achieved state-of-the-art performance on two benchmark datasets.

抄録全体を表示

PDF形式でダウンロード (395K)
日本語 Wikipedia の編集履歴に基づく入力誤りデータセットと訂正システムの構築

田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

2021 年28 巻4 号 p. 995-1033
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.995

ジャーナルフリー

抄録を表示する抄録を非表示にする

文章執筆時に発生する誤字などの入力誤りは，解析誤りを誘発するため，入力誤り訂正を行うシステムは重要である．入力誤り訂正システムの実現には，学習データとして多量の入力誤りとその訂正ペアが必要であるが，公開されている十分なサイズを持つ日本語入力誤りデータセットは存在しない．これまで，Wikipedia の編集履歴からフランス語などで入力誤りデータセットが構築されてきた．先行研究の手法は，編集のあった単語の特定を必要とするため，単語分割が必要な日本語に直接の適用はできない．本研究では，Wikipedia の編集履歴から，単語単位ではなく，文字単位の編集を手がかりとして入力誤りの候補を取り出し，それらに対しフィルタリングすることで入力誤りを収集する．この手法で約 70 万文ペアの大規模なデータセットを構築し，さらに，構築手法を評価した．次に，得られたデータセットを用いて，入力誤り訂正システムを構築する．訂正システムは，事前学習 seq2seq モデルを用い，入力誤り訂正のみを学習するシステムと，漢字の読みの推定を同時に学習するシステムを構築した．前者と比較して，後者は漢字の変換誤りの訂正において精度が向上した．また，学習データに疑似入力誤りデータを追加して学習し，その精度変化を見た．最後に，他の校正システムと入力誤り認識精度の比較を行い，本研究のシステムの精度が高いことを確認した．

抄録全体を表示

PDF形式でダウンロード (636K)
日本語話し言葉書き言葉変換による大学講義の日英翻訳の精度向上

中尾亮太, Chenhui Chu, 黒橋禎夫

2021 年28 巻4 号 p. 1034-1052
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1034

ジャーナルフリー

抄録を表示する抄録を非表示にする

話し言葉の機械翻訳では，話し言葉に特有の現象が翻訳精度に悪影響を及ぼすことが知られている．本研究では大学講義翻訳システムにおける日英翻訳の前処理として，日本語の話し言葉から書き言葉への自動変換を行うことにより翻訳精度を向上させる．まず大学講義の書き起こしとそれを書き言葉に変換したもの，対応する英文の 3 つ組からなるコーパスを構築した．次にそれを用いて話し言葉書き言葉変換モデルと日英翻訳モデルを学習させた．その結果，話し言葉書き言葉変換が日英翻訳の精度を向上させることを示した．また，話し言葉に特有の現象の分類に基づき，どのような現象が翻訳精度にどの程度影響するのかを定量化した．

抄録全体を表示

PDF形式でダウンロード (459K)
One-class Text Classification with Multi-modal Deep Support Vector Data Description

Chenlong Hu, Yukun Feng, Hidetaka Kamigaito, Hiroya Takamura, Manabu O ...

2021 年28 巻4 号 p. 1053-1088
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1053

ジャーナルフリー

抄録を表示する抄録を非表示にする

This work presents multi-modal deep SVDD (mSVDD) for one-class text classification. By extending the uni-modal SVDD to a multiple modal one, we build mSVDD with multiple hyperspheres, that enable us to build a much better description for target one-class data. Additionally, the end-to-end architecture of mSVDD can jointly handle neural feature learning and one-class text learning. We also introduce a mechanism for incorporating negative supervision in the absence of real negative data, which can be beneficial to one-class text models including mSVDD model. We conduct experiments on Reuters, 20 Newsgroup, and TREC datasets, and the experimental results demonstrate that mSVDD outperforms uni-modal SVDD and mSVDD can get further improvements when negative supervision is incorporated.

抄録全体を表示

PDF形式でダウンロード (587K)
複合化された固有表現認識のための教師なし並列構造解析

澤田悠冶, 寺西裕紀, 松本裕治, 渡辺太郎

2021 年28 巻4 号 p. 1089-1115
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1089

ジャーナルフリー

抄録を表示する抄録を非表示にする

固有表現認識は，科学技術論文などのテキストから分野特有の用語を機械的に抽出するタスクである．固有表現認識の従来研究は連続した範囲から成る固有表現のみを解析対象としているが，並列する固有表現の一部が省略された複合的表現が含まれており，これらの固有表現に対して個々の固有表現を抽出することが困難である．本研究では，近年の自然言語処理タスクで広く使用されている学習済み言語モデルを用いて，並列構造の教師データを用いずに並列する句の範囲を同定し，複合化された固有表現を正規化する手法を提案する．GENIA Treebank と GENIA term annotation を用いた評価実験では，教師情報を使用した先行研究と近い解析性能を示し，提案手法によって固有表現認識の精度が向上することを確認した．

抄録全体を表示

PDF形式でダウンロード (680K)
Stylistically User-specific Response Generation

Abdurrisyad Fikri, Hiroya Takamura, Manabu Okumura

2021 年28 巻4 号 p. 1116-1140
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1116

ジャーナルフリー

抄録を表示する抄録を非表示にする

The ability to capture the conversation context is a necessity to build a good conversation model. However, a good model must also provide interesting and diverse responses to mimic actual human conversations. Given that different people can respond differently to the same utterance, we believe that using user-specific attributes can be useful for a conversation task. In this study, we attempt to drive the style of generated responses to resemble the style of real people using user-specific information. Our experiments show that our method applies to both seen and unseen users. Human evaluation also shows that our model outperforms the baselines in terms of relevance and style similarity.

抄録全体を表示

PDF形式でダウンロード (284K)
発話順序に基づく Graph Attention Networks を用いた対話文における感情認識

石渡太智, 安田有希, 宮﨑太郎, 後藤淳

2021 年28 巻4 号 p. 1141-1161
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1141

ジャーナルフリー

抄録を表示する抄録を非表示にする

SNS 上のユーザ動向調査やフェイクニュース検知を目的に，対話文における各発話の感情認識 (Emotion Recognition in Conversations: ERC) が注目を集めている．ERC では，対話文における各発話の内容に加えて，発話間の関係が話者の感情に大きな影響を与えることが知られている．従来の State-of-the-art を達成した手法は，発話間の関係の中でも特に自己依存と相互依存を，Relational Graph Attention Networks (RGAT) を用いて取得し，当時の世界最高峰の認識精度を得た．しかしながら，RGAT モデルは発話の順序情報を利用できない課題がある．そこで本論文は，RGAT モデルに発話順序を加える新たな手法 Relational Position Encodings を提案する．提案手法を用いることで，自己依存と相互依存を含む発話間の関係と，発話の順序情報の両方を利用できる．評価実験において，ERC における3つのベンチマークデータセットのうち，2 つのデータセットで従来手法を上回る認識精度を達成し，世界最高峰の認識精度を達成した．

抄録全体を表示

PDF形式でダウンロード (592K)
ニューラル機械翻訳での目的言語側の文脈の効果的な利用

美野秀弥, 伊藤均, 後藤功雄, 山田一郎, 徳永健伸

2021 年28 巻4 号 p. 1162-1183
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1162

ジャーナルフリー

抄録を表示する抄録を非表示にする

本稿では，文脈を考慮したニューラル機械翻訳の精度向上のため，目的言語側の前文の参照訳と機械翻訳結果の両方を文脈情報として用いる手法を提案する．文脈として，原言語側または目的言語側の周辺の文が利用できるが，目的言語側の周辺の文を用いる手法は翻訳精度が下がることが報告されている．目的言語側の文脈を利用したニューラル機械翻訳では，学習時は参照訳を用い，翻訳時は機械翻訳結果を用いるため，参照訳と機械翻訳結果の特徴の異なり（ギャップ）が原因の 1 つと考えられる．そこで，学習時と翻訳時の目的言語側の文脈情報のギャップを緩和するために，学習時に用いる目的言語側の文脈情報を学習の進行に応じて参照訳から機械翻訳結果へ段階的に切り替えていく手法を提案する．時事通信社のニュースコーパスを用いた英日・日英機械翻訳タスクと，IWSLT2017 の TED トークコーパスを用いた英日・日英，および英独・独英機械翻訳タスクの評価実験により，従来の目的言語側の文脈を利用した機械翻訳モデルと比較して，翻訳精度が向上することを確認した．

抄録全体を表示

PDF形式でダウンロード (417K)
Anna: A Dapper Open-Domain Dialogue Agent Based on a Joint Attention Network

Itsugun Cho, Hiroaki Saito

2021 年28 巻4 号 p. 1184-1209
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1184

ジャーナルフリー

抄録を表示する抄録を非表示にする

We constructed a high-quality open-domain dialogue generation model called Anna that is composed of a hierarchical self-attention network with multiple convolution filters and a topic-augmented network. During daily conversations, humans typically respond by understanding a dialogue history and assembling their knowledge regarding the topic. However, existing dialogue generation models are weak at capturing the dependencies among words or utterances, resulting in an insufficient understanding of context and the generation of irrelevant responses. Previous works have largely ignored topic information modeling in multi-turn dialogue, making responses overly generic. Although pre-training using large-scale transformer models has recently resulted in enhanced performance, large parameter sizes complicate such models. Anna effectively captures contextual dependencies and assigns greater weight to important words and utterances to compute context representations. We incorporate topic information into our model as prior knowledge to synthesize topic representations. Two types of representations jointly determine the probability distributions of responses, which effectively simulates how people behave in real conversations. Empirical studies on both Chinese and English corpora demonstrate that Anna outperforms baseline models in terms of response quality, parameter size and decoding speed.

抄録全体を表示

PDF形式でダウンロード (602K)
数値気象予報からの天気予報コメントの自動生成

村上聡一朗, 田中天, 萩行正嗣 , 上垣外英剛, 船越孝太郎 , 高村大也, 奥村学

2021 年28 巻4 号 p. 1210-1246
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1210

ジャーナルフリー

抄録を表示する抄録を非表示にする

本研究では，数値気象予報のシミュレーション結果から天気予報コメントを自動生成するタスクに取り組む. 天気予報コメントの生成タスクには，(i) 様々な物理量の数値変化を考慮する必要がある，(ii) コメントの配信時刻や対象エリアに依存した表現が使われる，(iii) 天気予報コメントにおいて情報の有用性が重要視されている，といった特徴的な課題がある．本研究では，数値気象予報のシミュレーション結果，気象観測値，コメントのメタ情報を入力として，上記の特徴を捉えた上でテキスト化するための Data-to-Text モデルを提案する．また，天気予報コメントにおける情報の有用性の向上のために，晴天や雨などの気象情報を表す「天気ラベル」を予測する内容選択モデルを導入し，予測結果をテキスト生成時に考慮することで有用な情報を明示的に記述できるようにした. 実験では，自動評価と人手評価を行い，提案モデルはベースラインに対して情報の有用性の観点で最も優れていることを示した．

抄録全体を表示

PDF形式でダウンロード (1138K)
Metric-Type Identification for Multilevel Header Numerical Tables in Scientific Papers

Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Tak ...

2021 年28 巻4 号 p. 1247-1269
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1247

ジャーナルフリー

抄録を表示する抄録を非表示にする

Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. Herein, we introduce a new information extraction task, i.e., metric-type identification from multilevel header numerical tables, and provide a dataset extracted from scientific papers comprising header tables, captions, and metric-types. We propose joint-learning neural classification and generation schemes featuring pointer-generator-based and pretrained-based models. Our results show that the joint models can manage both in-header and out-of-header metric-type identification problems. Furthermore, transfer learning using fine-tuned pretrained-based models successfully improves the performance. The domain-specific of BERT-based model, SciBERT, achieves the best performance. Results achieved by a fine-tuned T5-based model are comparable to those obtained using our BERT-based model under a multitask setting.

抄録全体を表示

PDF形式でダウンロード (1195K)

応用システム論文

A Selection Support System for Enterprise Resource Planning Package Components using Ensembles of Multiple Models with Round-trip Translation

Masao Ideuchi, Yohei Sakamoto, Yoshiaki Oida, Isaac Okada, Shohei Higa ...

2021 年28 巻4 号 p. 1270-1298
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1270

ジャーナルフリー

抄録を表示する抄録を非表示にする

An enterprise resource planning (ERP) package consists of software to support day-to-day business activities and contains multiple components. System engineers combine the most appropriate software components for system integration using ERP packages. Because component selection is a very difficult task, even for experienced system engineers, there is a demand for machine-learning-based systems that support appropriate component selection by reading the text of requirement specifications and predicting suitable components. However, sufficient prediction accuracy has not been achieved thus far as a result of the sparsity and diversity of training data, which consist of specification texts paired with their corresponding components. We implemented round-trip translation at both training and testing times to alleviate the sparsity and diversity problems, adopted pre-trained models to exploit the similarity of text data, and utilized an ensemble of diverse models to take advantage of models for both the original and round-trip translated data. Through experiments with actual project data from ERP system integration, we confirmed that round-trip translation alleviates the problems mentioned above and improves prediction accuracy. As a result, our method achieved sufficient accuracy for practical use.

抄録全体を表示

PDF形式でダウンロード (252K)

学会記事

自動並列化深層学習ミドルウェア RaNNC

田仲正弘, 田浦健次朗, 塙敏博, 鳥澤健太郎

2021 年28 巻4 号 p. 1299-1306
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1299

ジャーナルフリー

PDF形式でダウンロード (530K)
NLP若手の会 (YANS) 第 16 回シンポジウム ―オンライン開催における施策について―

高瀬翔

2021 年28 巻4 号 p. 1307-1311
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1307

ジャーナルフリー

PDF形式でダウンロード (262K)
機械翻訳研究の科学的信頼性：品質評価方法の分析

Benjamin Marie, 藤田篤, Raphael Rubino

2021 年28 巻4 号 p. 1312-1318
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1312

ジャーナルフリー

PDF形式でダウンロード (315K)
Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification

酒井哲也

2021 年28 巻4 号 p. 1319-1324
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1319

ジャーナルフリー

PDF形式でダウンロード (278K)
文脈化単語埋め込みを利用した動詞の意味フレーム推定

山田康輔

2021 年28 巻4 号 p. 1325-1330
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1325

ジャーナルフリー

PDF形式でダウンロード (323K)
Do Grammatical Error Correction Models Realize Grammatical Generalization?

三田雅人

2021 年28 巻4 号 p. 1331-1335
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1331

ジャーナルフリー

PDF形式でダウンロード (269K)
Unified Interpretation of Softmax Cross Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

上垣外英剛, 林克彦

2021 年28 巻4 号 p. 1336-1341
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1336

ジャーナルフリー

PDF形式でダウンロード (228K)

賛助会員記事

AAMT 長尾賞と学生奨励賞の紹介

二宮崇

2021 年28 巻4 号 p. 1342-1348
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1342

ジャーナルフリー

PDF形式でダウンロード (513K)

後付記事

編集後記・編集スケジュール・統計情報・学会案内

2021 年28 巻4 号 p. 1349-1355
発行日: 2021年
公開日: 2021/12/15

DOIhttps://doi.org/10.5715/jnlp.28.1349

ジャーナルフリー

PDF形式でダウンロード (364K)

J-STAGEへの登録はこちら（無料）