IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Software and Information Processing>
Experimental Study of Higher-gram Index Length for N-gram Full Text Search System
Hiroshi YamamotoHiroshi Tsuji
Author information
JOURNAL FREE ACCESS

2006 Volume 126 Issue 9 Pages 1173-1180

Details
Abstract
N-gram indexing method is the most popular algorithm for the Japanese full text search system where each index consists of serial N characters. Especially the full text search for Japanese text usually has the 2-gram characters index as base in order to save the volumes of the index file. Although the additional higher-gram index is expected to improve the performance for searching indices, we have no experimental evaluation with additional higher-gram indices. This paper presents the evaluation about improving the text search performance with additional higher-gram indices by Search Term Intensive Approach which decides the term for higher-gram indices depend upon the appearance ratio in application programs as the searching term. On the concrete evaluation, the number of paper articles for searching is one or two hundred thousands, and the simulation for 5 or more gram additional indices can be applied add to evaluation for 3,4-gram additional indices.
Content from these authors
© 2006 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top