単語の共起と出現頻度に着目した文書の索引付け

奥井 颯平; 猪口 明博

doi:10.11517/jsaisigtwo.2015.DOCMAS-009_02

Abstract

In this paper, we propose two models to weight each term in the document for document retrieval. Our idea of the models come from traditional Term Frequencies (TFs) and Term Weights (TWs) proposed in 2013. TF is based on the number of term occurrences in a document and used as de facto standard. On the other hand, TW is based on variation of term co-occurrences in a document and outperforms to TF. Our proposed models give much weight to terms which cooccur with terms frequently occur. We show experimental results comparing to the conventional models using a very large text corpus.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!