日本語テキストに対する統計的検索手法の性能比較 －テストコレクションによる実証－

岸田 和明

doi:10.2964/jsikproc.8.0_61

Abstract

The paper reports some findings from an empirical study on comparison of retrieval performance between some statistical methods : vector space and probabilistic models. A large Japanese text test collection provided by the NACSIS was used, which consists of about 330,000 records of scientific proceedings. Each statistical method was testified using three kinds of indexing techniques for Japanese text : (1) longest matching against entries in a dictionary, (2) tokenizing by change of kind of characters, (3) a simple bi-gram method. Almost no statistically significant difference among the methods was observed, but it seems that probabilistic method based on logistic regression model indicates relatively better performance than other methods.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!