Journal of Information Processing
Online ISSN : 1882-6652
Compressed Vector Set: A Fast and Space-Efficient Data Mining Framework
Masafumi OyamadaJianquan LiuShinji ItoKazuyo NaritaTakuya ArakiHiroyuki Kitagawa
Author information
JOURNALS FREE ACCESS

2018 Volume 26 Pages 416-426

Details
Abstract

In this paper, we present CVS (Compressed Vector Set), a fast and space-efficient data mining framework that efficiently handles both sparse and dense datasets. CVS holds a set of vectors in a compressed format and conducts primitive vector operations, such as lp-norm and dot product, without decompression. By combining these primitive operations, CVS accelerates prominent data mining or machine learning algorithms including k-nearest neighbor algorithm, stochastic gradient descent algorithm on logistic regression, and kernel methods. In contrast to the commonly used sparse matrix/vector representation, which is not effective for dense datasets, CVS efficiently handles sparse datasets and dense datasets in a unified manner. Our experimental results demonstrate that CVS can process both dense datasets and sparse datasets faster than conventional sparse vector representation with smaller memory usage.

Information related to the author
© 2018 by the Information Processing Society of Japan
Previous article Next article
feedback
Top