自然言語処理

巻頭言

10単語以内の文の翻訳

新納浩幸

2018 年25 巻4 号 p. 329-330
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.329

ジャーナルフリー

PDF形式でダウンロード (123K)

論文

『現代日本語書き言葉均衡コーパス』に対する文節係り受け・並列構造アノテーション

浅原正幸, 松本裕治

2018 年25 巻4 号 p. 331-356
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.331

ジャーナルフリー

抄録を表示する抄録を非表示にする

本稿では『現代日本語書き言葉均衡コーパス』のコアデータに対する文節係り受け・並列構造情報のアノテーションについて述べる．統語構造のアノテーションに対して，文節係り受け情報と並列・同格構造を分離してアノテーションする方法を提案する．さらに節境界を越える係り受け関係について，節の分類に基づきスコープを決めることでよりアノテーションの精緻化を行う．実作業の工程上の問題などにも言及しながら，アノテーション基準を概説する．また，アノテーションデータの基礎統計量について示す．

抄録全体を表示

PDF形式でダウンロード (471K)
スポーツ要約生成におけるテンプレート型手法とニューラル型手法の提案と比較

田川裕輝, 嶋田和孝

2018 年25 巻4 号 p. 357-391
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.357

ジャーナルフリー

抄録を表示する抄録を非表示にする

本研究では，日本で人気のある野球に着目し，Play-by-play データからイニングの要約文の生成に取り組む．Web 上では多くの野球に関する速報が配信されている．戦評は試合終了後にのみ更新され，“待望の先制点を挙げる”のような試合の状況をユーザに伝えるフレーズ（本論文では Game-changing Phrase; GP と呼ぶ）が含まれているのが特徴であり，読み手は試合の状況を簡単に知ることができる．このような特徴を踏まえ，任意の打席に対して，GP を含む要約文を生成することは，試合終了後だけでなく，リアルタイムで試合の状況を知りたい場合などに非常に有益であるといえる．そこで，本研究では Play-by-play データから GP を含む要約文の生成に取り組む．また，要約生成手法としてテンプレート型文生成手法と Encoder-Decoder モデルを利用した手法の 2 つを提案する．

抄録全体を表示

PDF形式でダウンロード (820K)
Hierarchical Coordinate Structure Analysis for Japanese Statutory Sentences Using Neural Language Models

Takahiro Yamakoshi, Tomohiro Ohno, Yasuhiro Ogawa, Makoto Nakamura, Ka ...

2018 年25 巻4 号 p. 393-419
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.393

ジャーナルフリー

抄録を表示する抄録を非表示にする

We propose a method for analyzing the hierarchical coordinate structure of Japanese statutory sentences using neural language models (NLMs). Our method deterministically identifies hierarchical coordinate structures according to their rigorously defined descriptive rules. In addition, our method identifies all conjuncts in each coordinate structure using NLM-based scoring. Furthermore, it does not rely on any training data labeled with coordinate structures. An experiment demonstrates that our method drastically outperforms an existing method for Japanese statutory sentences.

抄録全体を表示

PDF形式でダウンロード (3603K)
正誤情報と文法誤りパターンを考慮した単語分散表現を用いた文法誤り検出

金子正弘, 堺澤勇也, 小町守

2018 年25 巻4 号 p. 421-439
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.421

ジャーナルフリー

抄録を表示する抄録を非表示にする

本稿では，文法誤り検出のための正誤情報と文法誤りパターンを考慮した単語分散表現の学習手法を提案する．これまでの文法誤り検出で用いられている単語分散表現の学習では文脈だけをモデル化しており，言語学習者に特有の文法誤りを考慮していない．そこで我々は，正誤情報と文法誤りパターンを考慮することで文法誤り検出に特化した単語分散表現を学習する手法を提案する．正誤情報とは，n-gram 単語列内のターゲット単語が誤っているのか正しいのかというラベルである．これは単語単位の誤りラベルを元に決定される．誤りパターンとは，学習者が誤りやすい単語の組み合わせである．誤りパターンは大規模な学習者コーパスから単語分散表現の学習のために抽出することが可能である．この手法で学習した単語分散表現で初期化した Bidirectional Long Short-Term Memory を分類器として使うことで，First Certificate in English コーパスに対する文法誤り検出において世界最高精度を達成した．

抄録全体を表示

PDF形式でダウンロード (418K)
語系列の類似性・可換性の特徴表現による並列句の範囲同定

寺西裕紀, 進藤裕之, 松本裕治

2018 年25 巻4 号 p. 441-462
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.441

ジャーナルフリー

抄録を表示する抄録を非表示にする

並列構造解析の主たるタスクは並列する句の範囲を同定することである．並列構造は文の構文・意味の解析において有用な特徴となるが，これまで決定的な解析手法が確立されておらず，現在の最高精度の構文解析器においても誤りを生じさせる主たる要因となっている．既存の並列句範囲の曖昧性解消手法は並列構造の類似性のみの特性や構文解析器の結果に強く依存しているという問題があった．本研究では，近年自然言語解析に広く使用されているリカレントニューラルネットワークを用いて，構文解析の結果を用いずに単語の表層形と品詞情報のみから並列句の類似性と可換性の特徴ベクトルを計算し，並列構造の範囲を予測する手法を提案する．Penn Treebank と GENIA コーパスを用いた実験の結果，提案手法によって先行研究を上回る解析精度を得た．

抄録全体を表示

PDF形式でダウンロード (1011K)
Domain Adaptation using Word Embeddings for Word Sense Disambiguation

Kanako Komiya, Minoru Sasaki, Hiroyuki Shinnou, Manabu Okumura

2018 年25 巻4 号 p. 463-480
発行日: 2018/09/15
公開日: 2018/12/15

DOIhttps://doi.org/10.5715/jnlp.25.463

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we propose domain adaptation using word embeddings for word sense disambiguation (WSD). The validity for WSD of word embeddings derived from a huge corpus such as Wikipedia had already been shown, but their validity in a domain adaptation framework has not been previously discussed. If word embeddings are valid in this new context, the impact of the document type of the corpora on WSD is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using word embeddings from the source, target and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects of the document type of the corpora from which the word embeddings are derived. We used three corpora of distinct document types and performed domain adaptation experiments using the document types as the domains. The experiments, conducted using Japanese corpora, revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpora together with a general corpora.

抄録全体を表示

PDF形式でダウンロード (156K)

J-STAGEへの登録はこちら（無料）