自然言語処理

学会誌「自然言語処理」のあり方について

田窪行則

1998 年 5 巻 4 号 p. 1-2
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_1

ジャーナルフリー

PDF形式でダウンロード (226K)
A Hybrid Approach for Resolving Ambiguities in Coordinate Structures

Haodong Wu, Teiji Furugori

1998 年 5 巻 4 号 p. 3-16
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_3

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper describes a method in determining syntactic structure for coordinate constructions. It is based on the information taken from semantic similarities, selectional restrictions, and some other linguistic cues. We discuss the role the information plays in resolving ambiguities that appear in coordinate constructions, describe the means of acquiring the necessary information automatically from two on-line corpora and a lexical database, and devise two algorithms for disambiguating coordinate constructions. An experiment that follows shows effectiveness of our method and its applicability to resolving ambiguities in some other syntactic structures.

抄録全体を表示

PDF形式でダウンロード (1206K)
The Application of Classification Trees to Bunsetsu Segmentation of Japanese Sentences

Yujie Zhang, Kazuhiko Ozeki

1998 年 5 巻 4 号 p. 17-33
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_17

ジャーナルフリー

抄録を表示する抄録を非表示にする

In conventional bunsetsu segmentation methods for Japanese sentences, segmentation rules have been given manually. This causes difficulties in maintaining the consistency of the rules, and in deciding an efficient order of rule application. This paper proposes a method of automatic bunsetsu segmentation using a classification tree, by which knowledge about bunsetsu boundaries is automatically acquired from a corpus, and an efficient order of rule application is realized automatically. It can adapt quickly to a new system of parts of speech, and also to a new task domain without the need for changing the algorithm. Generation of classification trees for bunsetsu segmentation and evaluation experiments were carried out on an ATR corpus and an EDR corpus. The segmentation accuracy of 98.9% was achieved for the ATR corpus, and 96.2% for the EDR corpus. The method was compared with a simple rule-based method and the Bayes decision rule on the ATR corpus. The proposed method outperformed the rule-based method when the training data size was larger than about 20 sentences, and outperformed the Bayes decision rule over the whole range of training data sizes. The superiority of the proposed method was more evident over the former when the training data size was larger, and over the latter when the training data size was smaller.

抄録全体を表示

PDF形式でダウンロード (1521K)
文書走査を用いた複合名詞解析

久光徹, 新田義彦

1998 年 5 巻 4 号 p. 35-60
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_35

ジャーナルフリー

抄録を表示する抄録を非表示にする

複合名詞は文書の内容を凝縮できる程の情報を担うことができるため重要語となりやすく, しばしば文書内容を理解する上での鍵となる. このため, 複合名詞解析 (=その構成要素間の掛かり受け解析) は, 機械翻訳にとどまらず, 情報抽出や情報検索の高度化にも貢献すると期待されている. しかし, 複合名詞は単なる名詞の連鎖に過ぎないため構文上の手掛かりが無く, 人手で構成したルールや, シソーラスに記述された概念の共起尤度等を用いて解析する方法が提案されてきた. しかし, 新聞記事などの未登録語が頻出する開いた大規模テキストを扱う場合は想定されてこなかったため, そのような場合には頑健性の点で問題が生じる. 本論文は, 大量の電子化文書が高速に処理可能な昨今の状況を念頭に置き, シソーラス等の予め固定されたデータを用いるのではなく, 文書中から直接文字列レベルの共起情報を抽出するだけで, 高い精度で複合名詞解析が可能なことを示す. まず, 与えられた複合名詞を暫定的に形態素解析し, 得られた構成単語の共起情報を複数のテンプレートを用いて抽出する. 共起情報を抽出する段階で, 語の出現状況から, 複合名詞内の短い複合名詞や, 誤って過分割された略称等の未登録語を検出すると同時に, これらの共起情報を抽出することにより, 未登録語に対する頑健性が達成される. これに加えて, 共起情報が不足する場合のヒューリスティクスに関して検討を加え, 文書から直接得られる共起情報と若干のルールを併用することにより, 高精度な複合名詞解析が達成できた. 新聞記事から抽出した長さ5, 6, 7, 8の複合名詞各100個を対象に実験を行った結果, 新聞1年分を用いて, それぞれ90, 86, 84, 84個の正解が得られた.

抄録全体を表示

PDF形式でダウンロード (2610K)
Symmetric Pattern Matching Analysis for English Coordinate Structures

Akitoshi Okumura, Kazunori Muraki

1998 年 5 巻 4 号 p. 61-76
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_61

ジャーナルフリー

抄録を表示する抄録を非表示にする

The authors propose a model for analyzing English sentences including coordinate conjunctions such as “and”, “or”, “but” and equivalent words. The syntactic analysis of English coordinate sentences is one of the most difficult problems in machine translation (MT) systems. The problem is selecting, from all possible candidates, the correct syntactic structure formed by an individual coordinate conjunction, i. e. determining which constituents are coordinated by the conjunction. Typically, so many possible structures are produced that MT systems cannot select the correct one, even if the grammars allow us to write the rules in simple notations. This paper presents an English coordinate structure analysis model, which provides top-down scope information on the correct syntactic structure by taking advantage of the symmetric patterns of parallelism. The model is based on a balance-matching operation for two lists of feature sets. It has four effects, namely: a reduction in analysis costs, a decrease in word disambiguation, the interpretation of ellipses, and robust analysis. This model was practically implemented and incorporated into the English-Japanese MT system, and it had about 70%accuracy for 3215Wall Street Journal sentences.

抄録全体を表示

PDF形式でダウンロード (1439K)
コンパラブルコーパスと対訳辞書による日英クロス言語検索

奥村明俊, 石川開, 佐藤研治

1998 年 5 巻 4 号 p. 77-93
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_77

ジャーナルフリー

抄録を表示する抄録を非表示にする

クロス言語検索手法GDMAXは, 日本語入力から英語ドキュメントの検索を可能にする. GDMAXは, 対訳辞書によって入力キュエリから翻訳キュエリ候補を生成し, キュエリからそれぞれの言語のコーパスにおけるキュエリタームの共起頻度を成分とすそ共起頻度ベクトルを生成する. 入力共起頻度ベクトルと翻訳共起頻度ベクトルとの距離によって, 翻訳キュエリ候補をランキングし, 上位の英語キュエリ集合を検索キュエリとする. この手法によって, 一つの対訳だけでなく適切な複数の訳語集合を英言語キュエリとして得ることができる. ウォールストリートジャーナルやAP通信など2ギガの英語ドキュメントについて適合率と再現率で評価したところ, 理想訳と比べて約62%の精度を得て, 対訳辞書のすべての訳語候補を用いる場合と比べて12%, 機械翻訳による訳語選択と比べて6%高い精度を得ることができた.

抄録全体を表示

PDF形式でダウンロード (3303K)
点字翻訳ボランティアのための対話型分かち書き支援システム

鈴木恵美子, 小野智司, 狩野均

1998 年 5 巻 4 号 p. 95-110
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_95

ジャーナルフリー

抄録を表示する抄録を非表示にする

日本語文書を点字に翻訳する問題をとりあげ, 分かち書きのための規則を分類, 整理して知識ベース化し, システムが判断し難い箇所のみを選択的にユーザに提示する対話型の分かち書き支援システムを提案する. このシステムでは文法情報を含む大規模な辞書の代わりに見出し語のみからなる小規模なテーブルを用いることにより, 辞書構築の手間を削減した. 従来より日本語を点字に翻訳するシステムは過去にいくつか提案され市販されているが, 処理は一括して行われ, ユーザの介入する余地はない. システムが誤って翻訳した箇所については点字翻訳ボランティアが全文を見直す必要があり, 実際には利用し難いのが現状である. ここでは分かち書きの規則を知識ベース化してアルゴリズムから独立させ, システムとユーザが協調することによって, 日本語点字翻訳のための分かち書きを対話的に行うシステムについて述べる. 本システムで情報処理関連のテキストを処理し, その有効性を確認した.

抄録全体を表示

PDF形式でダウンロード (4921K)
意味的類似性を用いた音声認識正解部分の特定法と正解部分のみ翻訳する音声翻訳手法

脇田由実, 河井淳, 飯田仁

1998 年 5 巻 4 号 p. 111-125
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_111

ジャーナルフリー

抄録を表示する抄録を非表示にする

音声対話および音声翻訳システムを実現するためには, 自由発話文の音声認識誤り文に対する解析誤りの問題を解決する必要がある. その解決のために, 文法以外の制約を積極的に用いて認識誤り文から正しく認識された部分を特定するしくみを新たに導入し, 特定された部分, 或は, 特定されなかった部分を修復しながら, 文を解析することが必要となる. 本論では, 予め学習された話し言葉の表現パターンと入力文における表現パターンとの意味的類似性を用いて, 認識結果文から正しく認識された部分を特定する手法を提案する. さらに, 本正解部分特定法を音声翻訳システムに導入し, 音声認識結果の正解部分のみを部分翻訳するシステムを作成した. このシステムを用いて正解部分特定法の効果を評価し, その結果から次の効果を確認した. 本正解部分特定法により特定された部分の信頼性は高く, 特定した部分の96%が実際に正解部分であった. また, 特定された部分のみを提示することにより, 誤り文をそのまま誤った意味に理解してしまう割合を半分以上軽減することができた.さらに, 特定された正解部分のみを部分翻訳した結果, 従来翻訳できなかった誤り文の約7割に対して, 正しいかもしくは部分的に正しく意味を理解できる翻訳結果を得ることができた.

抄録全体を表示

PDF形式でダウンロード (2845K)
日韓機械翻訳システムの現状分析および開発への提言

金泰完, 崔杞鮮

1998 年 5 巻 4 号 p. 127-149
発行日: 1998/10/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_127

ジャーナルフリー

抄録を表示する抄録を非表示にする

本技術資料は, 現在入手可能な日韓機械翻訳システムを対象に翻訳品質の評価を行い, 日韓機械翻訳システムの現状および技術水準を把握, 今後の研究方向についてのいくつかの提言を行うことを目的とする. 現在, 韓国国内で発表, あるいは発売されている日韓機械翻訳システムの中で, 入手可能な四つの製品に対してユーザサイドからの翻訳品質の分析と言語学的な解決範囲を把握するための対照言語学的誤謬分析を行う. さらに, (Choi and Kim 1996) と比較することにより日韓翻訳システムの性能向上の度合いを比較する. これにより, 日韓機械翻訳システムの性能向上のための長期・短期課題を考える. 本技術資料は, 対象にした各々のシステムの優劣のランク付けを目的とするものではないことをあらかじめ断っておく. 本技術資料での評価は限られた観点からの分析に基づいたものであるからである.

抄録全体を表示

PDF形式でダウンロード (3516K)
訂正

1998 年 5 巻 4 号 p. 151
発行日: 1998年
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.4_151

ジャーナルフリー

PDF形式でダウンロード (34K)

J-STAGEへの登録はこちら（無料）