自然言語処理

日本語学と日本語情報処理

中野洋

1997 年 4 巻 2 号 p. 1-2
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_1

ジャーナルフリー

PDF形式でダウンロード (171K)
文節間係り受け距離の統計的性質を用いた日本語文の係り受け解析

張玉潔, 尾関和彦

1997 年 4 巻 2 号 p. 3-19
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_3

ジャーナルフリー

抄録を表示する抄録を非表示にする

日本語における2文節間の係り受け頻度は, その距離に依存することが知られている.すなわち, 文中の文節はその直後の文節に係ることが最も多く, 文末の文節に係る場合を除いては, 距離が離れるにしたがってその頻度が減少する.この統計的性質は, 日本語文の係り受け解析においてしばしば用いられるヒューリスティクス: 「文中の文節は係り得る文節の中で最も近いものに係る」の根拠となっている.しかし, このヒューリスティクスは, 日本語に見られるこのような統計的性質の一部しか利用していない.したがって, 係り受け距離の頻度分布をもっと有効に利用することにより, 解析性能が向上する可能性がある.本研究では, ATR503文コーパスから抽出した係り受け距離の頻度分布に基づいて2文節間の係り受けペナルティ関数を定義し, 「総ペナルティ最小化法」を用いて係り受け解析実験を行なった.その結果を, 上のヒューリスティクスに基づく決定論的解析法による解析結果と比較したところ, かなりの解析性能向上が認められた. また, 係り文節を分類し, その種類別に抽出した係り受け頻度の情報を用いることにより, さらに解析性能を改善できることが明らかになった.

抄録全体を表示

PDF形式でダウンロード (1544K)
コーパスに基づく動詞の多義解消

福本文代, 辻井潤一

1997 年 4 巻 2 号 p. 21-39
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_21

ジャーナルフリー

抄録を表示する抄録を非表示にする

本稿では, コーパスから抽出した動詞の語義情報を利用し, 文中に含まれる多義語の曖昧性を解消する手法を提案する. 先ずコーパスから動詞の多義解消に必要な情報を抽出する手法について述べる. 本手法では, 多義を判定しながら意味的なクラスタリングを行なうことで多義解消に必要な情報を抽出する. そこで, 表層上は一つの要素である多義語動詞を, 多義が持つ各意味がまとまった複数要素であると捉え, これを一つ一つの意味に対応させた要素 (仮想動詞ベクトルと呼ぶ) に分解した上でクラスタを作成するという手法を用いた. 本手法の有効性を検証するため, 丹羽らの提案した単語ベクトルを用いた多義語の解消手法と比較実験を行なった結果, 14種類の多義語動詞を含む1, 226文に対し, 丹羽らの手法が平均62.7%の正解率に対し, 本手法では71.1%の正解率を得た.

抄録全体を表示

PDF形式でダウンロード (1706K)
意味的制約を用いた日本語名詞における間接照応解析

村田真樹, 長尾真

1997 年 4 巻 2 号 p. 41-56
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_41

ジャーナルフリー

抄録を表示する抄録を非表示にする

照応現象の一つに, 文章中に現れていないがすでに言及されたことに関係する事物を間接的に指示する間接照応という用法がある. 間接照応の研究はこれまで自然言語処理においてあまり行なわれていなかったが, 文章の結束性の把握や意味理解において重要な問題である. 間接照応の解析を行なうには, 二つの名詞間の関係に関する知識として名詞格フレーム辞書が必要となるが, 名詞格フレーム辞書はまだ存在していないので, 「名詞Aの名詞B」の用例と用言格フレーム辞書を代わりに利用することにした. この方法で, テストサンプルにおいて再現率63%, 適合率68%の精度で解析できた. このことは, 名詞格フレーム辞書が存在しない現在においてもある程度の精度で間接照応の解析ができることを意味している. また, 完全な名詞格フレーム辞書が利用できることを仮定した実験も行なったが, この精度はテストサンプルにおいて再現率71%, 適合率82%であった. また, 名詞格フレーム辞書の作成に「名詞Aの名詞B」を利用する方法を示した.

抄録全体を表示

PDF形式でダウンロード (1693K)
On Semantic Interpretation of Japanese Compound Nouns

Masato Shiraishi, Masao Yokota

1997 年 4 巻 2 号 p. 57-70
発行日: 1997/04/10
公開日: 2011/06/07

DOIhttps://doi.org/10.5715/jnlp.4.2_57

ジャーナルフリー

抄録を表示する抄録を非表示にする

Toward the realization of a natural language understanding system for clinical records, the authors have analyzed a large number of discharge summaries (a kind of clinical record). In the records many Japanese compound nouns appear due to ellipsis. Therefore, it is very essential to the understanding system to cope with them. This paper describes a system to paraphrase compound nouns by restoring their elliptical constructions in use of their semantic categorie categories (Yokota, Nishimura, Shiraishi and Ryu 1994) according to the Mental-image directed semantic theory (Yokota 1988; Yokota, Shiraishi, Ryu, and Oda 1991b).This system consists of four major processors: “Word segmentation processor, ” “Restoration processor, ” “Hierarchical relation detector” and “Sentence generator”, and possesses two types of dictionary: “Word dictionary” and “Hierarchy dictionarv”. The fbrmer of the dictionaries assigns a semantic category, etc. to each noun, and the latter contains the hierarchic relations among the concepts of objects (one of the semantic categories of nouns). The experimental result of the system has proven to be fairly successful.

抄録全体を表示

PDF形式でダウンロード (1204K)
Clustering Words with the MDL Principle

Hang Li, Naoki Abe

1997 年 4 巻 2 号 p. 71-88
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_71

ジャーナルフリー

抄録を表示する抄録を非表示にする

We address the problem of automatically constructing a thesaurus (hierarchically clustering words) based on corpus data. We view the problem of clustering words as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose an estimation algorithm using simulated annealing with an energy function based on the Minimum Description Length (MDL) Principle. We empirically compared the performance of our method based on the MDL Principle against a method based on the Maximum Likelihood Estimator, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that we can improve accuracy in disambiguation by using such a thesaurus.

抄録全体を表示

PDF形式でダウンロード (1507K)
文脈依存の度合を考慮した重要パラグラフの抽出

福本文代, 福本淳一, 鈴木良弥

1997 年 4 巻 2 号 p. 89-109
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_89

ジャーナルフリー

抄録を表示する抄録を非表示にする

本稿では, 文脈依存の度合いに注目し, 重要パラグラフを抽出する手法を提案する. 本手法では, Luhnらにより提唱されたキーワード密度方式と同様, 「主題と関係の深い語はパラグラフを跨り一貫して出現する」という前提に基づく. 我々は, 文脈依存の度合, すなわち, 記事中の任意の語が, 設定された文脈にどのくらい深く関わっているかという度合いの強さを用いることで, 主題と関係の深い語を抽出し, その語に対し重み付けを行なった.本手法の精度を検証するため人手により抽出したパラグラフと比較した結果, 抽出率を30%とした場合, 50記事の抽出総パラグラフ数84に対し75パラグラフが正解であり, 正解率は89.2%に達した.

抄録全体を表示

PDF形式でダウンロード (2007K)
Case Contribution in Example-Based Verb Sense Disambiguation

Atsushi Fujii, Kentaro Inui, Takenobu Tokunaga, Hozumi Tanaka

1997 年 4 巻 2 号 p. 111-123
発行日: 1997/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.4.2_111

ジャーナルフリー

抄録を表示する抄録を非表示にする

Word sense disambiguation has recently been utilized in corpus-based approaches, reflecting the growth in the number of machine readable texts.One category of approaches disambiguates an input verb sense based on the similarity between its governing case fillers and those in given examples. In this paper, we introduce the degree of case contribution to verb sense disambiguation into this existing method. In this, greater diversity of semantic range of case filler examples will lead to that case contributing to verb sense disambiguation more. We also report the result of a comparative experiment, in which the performance of disambiguation is improved by considering this notion of semantic contribution.

抄録全体を表示

PDF形式でダウンロード (1221K)

J-STAGEへの登録はこちら（無料）