Proceedings of Annual Conference, Japan Society of Information and Knowledge
Online ISSN : 2432-9908
ISSN-L : 2432-9908
Proceedings of the 8th Workshop on the Japan Society of Information and Knowledge
Conference information

Empirical examination on performance of some statistical methods for Japanese text retrieval by using large test collection
*Kazuaki KISHIDA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 61-64

Details
Abstract
The paper reports some findings from an empirical study on comparison of retrieval performance between some statistical methods : vector space and probabilistic models. A large Japanese text test collection provided by the NACSIS was used, which consists of about 330,000 records of scientific proceedings. Each statistical method was testified using three kinds of indexing techniques for Japanese text : (1) longest matching against entries in a dictionary, (2) tokenizing by change of kind of characters, (3) a simple bi-gram method. Almost no statistically significant difference among the methods was observed, but it seems that probabilistic method based on logistic regression model indicates relatively better performance than other methods.
Content from these authors
© 2000 Japan Society of Information and Knowledge
Previous article Next article
feedback
Top