IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Highly Compressed Lists of Integers with Dense Padding Modes
Kun JIANGXingshen SONGYuexiang YANG
Author information
JOURNAL FREE ACCESS

2015 Volume E98.D Issue 11 Pages 1986-1989

Details
Abstract
Index compression is partially responsible for the current performance achievements of Internet search engines. Among many latest compression techniques, Simple9 can pack as many integers as possible into a single 32-bit machine word using 9 different padding modes. However, the number of wasted bits in Simple9 remains large. In previous works, researchers have focused on reducing the unused trailing bits of the padding modes and have proposed various additional modes that make full use of the cases of the status bits. Instead, we focus on the wasted bits in the integer list, padding extra zeros for a complete dense mode when the number of integers is not enough to fit a complete mode. More precisely, we first propose a novel index compression method called SimpleD with dense padding modes to achieve a more compact storage compared with that of Simple9. We then design an innovative metric for extracting the inserted extra zero integers during the decoding phase. Experiments on the TREC WT2G and GOV2 datasets show that our encoder outperforms Simple9 while still retaining a very fast decompression speed.
Content from these authors
© 2015 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top