自然言語処理

研究工房としての学会への期待

渕一博

1998 年 5 巻 2 号 p. 1-2
発行日: 1998/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.2_1

ジャーナルフリー

PDF形式でダウンロード (207K)
日韓機械翻訳における様相テーブルに基づいた韓国語述部の生成処理

金政仁, 李鐘赫, 李根培

1998 年 5 巻 2 号 p. 3-24
発行日: 1998/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.2_3

ジャーナルフリー

抄録を表示する抄録を非表示にする

日韓機械翻訳を研究している多くの研究者らは両国語の文節単位の語順一致のような類似性を最大に生かすため, 直接翻訳方式を採択している. しかし, 日本語と韓国語の述部問には, 対応する品詞の不一致, 局部的な語順の不一致, 活用ルールの不一致, 時制表現の不一致などが解決しにくい問題として残っている. 本稿では述部表現の不一致を解決するため “様相テーブルに基づいた韓国語の生成方法” を提案し, それに対して体系的な評価を行なう. この方法は述部だけを対象にする抽象的で意味記号的な様相資質をテーブル化し, 両国語の述部表現のPIVOTとして用いることにより, 述部の様相表現の効果的な翻訳を可能とする. 朝日新聞と日本語の文法本から抽出した2, 338個の例文を対象に述部の翻訳処理を試みた結果, 約97.5%が自然に翻訳され, 述部翻訳の際, 本方法が有効であることが確認できた.

抄録全体を表示

PDF形式でダウンロード (2044K)
A Method for Syntactic Behavior Analysis

Wide R. Hogenhout, Yuji Matsumoto

1998 年 5 巻 2 号 p. 25-46
発行日: 1998/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.2_25

ジャーナルフリー

抄録を表示する抄録を非表示にする

We show how a treebank can be used to cluster words on the basis of their syntactic behavior. By extracting statistics on the structures in which words appear it is possible to discover similarities and differences in usage between words with the same part-of-speech. This clustering is compared to the conventional clustering based on co-occurrences. While conventional clustering can discover semantical similarities or the tendency to appear together, the method we present ignores these factors and places the focus on syntactical usage, in other words the sort of structures it appears in. We present a case study on prepositions, showing how they can be automatically subdivided by their syntactic behavior and we discuss the appropriateness of such a subdivision. We have also carried out experiments to compare the quality of clusters quantitatively. For this goal we used clusters based on syntactic behavior for improving the estimation of the distribution of the dependency relation between words. Since such a distribution is necessarily estimated with sparse data, an entropy test can show how informative the classes are about syntactic usage. Finally, we discuss a number of ways in which a classification of words can contribute to applications of natural language processing.

抄録全体を表示

PDF形式でダウンロード (1963K)
General Word Sense Disambiguation Method Based on a Full Sentential Context

Jiri Stetina, Makoto Nagao

1998 年 5 巻 2 号 p. 47-74
発行日: 1998/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.2_47

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents a new general supervised word sense disambiguation method based on a relatively small syntactically parsed and semantically tagged training corpus.The method exploits a full sentential context and all the explicit semantic relations in a sentence to identify the senses of all of that sentence's content words. It solves the sparse data problem of a small training corpus by substituting the words by their semantic classes.In spite of a very small training corpus, we report an overall accuracy of 80.3% (85.7, 63.9, 83.6 and 86.5%, for nouns, verbs, adjectives and adverbs, respectively), which exceeds the accuracy of a statistical sense-frequency based semantic tagging, the only really applicable general disambiguating technique. Because the method uses the sentential syntactic structure it is particularly suitable for integration with a probabilistic syntactic analyser.

抄録全体を表示

PDF形式でダウンロード (2762K)
形態素クラスタリングによる形態素解析精度の向上

森信介, 長尾眞

1998 年 5 巻 2 号 p. 75-103
発行日: 1998/04/10
公開日: 2011/03/01

DOIhttps://doi.org/10.5715/jnlp.5.2_75

ジャーナルフリー

抄録を表示する抄録を非表示にする

本論文では, 形態素クラスタリングと未知語モデルの改良による確率的形態素解析器の精度向上を提案する. 形態素クラスタリングとしては, 形態素n-gramモデルをクロスエントロピーを基準としてクラスn-gramモデルに改良する方法を提案する. 未知語モデルの改良としては, 確率モデルの枠組の中で学習コーパス以外の辞書などで与えられる形態素を追加する方法を提案する. bi-gramモデルを実装しEDRコーパスを用いて実験を行なった結果, 形態素解析の精度の向上が観測された. 両方の改良を行なったモデルによる形態素解析実験の結果の精度は, 先行研究として報告されている品詞tri-gramモデルの精度を上回った. これは, 我々のモデルが形態素解析の精度という点で優れていることを示す結果である. これらの実験に加えて, 品詞体系と品詞間の接続表を文法の専門家が作成した形態素解析器との精度比較の実験を行なった. この結果, 確率的形態素解析器の誤りは文法の専門家による形態素解析器の誤りに対して有意に少なかった. 形態素解析における確率的な手法は, このような人間の言語直感に基づく形態素解析器と比較して, 現時点で精度がより高いという長所に加えて, 今後のさらなる改良にも組織的取り組みが可能であるという点で有利である.

抄録全体を表示

PDF形式でダウンロード (2342K)