Performance Optimization for Sparse AtAx in Parallel on Multicore CPU

Yuan TAO; Yangdong DENG; Shuai MU; Zhenzhong ZHANG; Mingfa ZHU; Limin XIAO; Li RUAN

doi:10.1587/transinf.E97.D.315

Regular Section

Performance Optimization for Sparse A^tAx in Parallel on Multicore CPU

Yuan TAO, Yangdong DENG, Shuai MU, Zhenzhong ZHANG, Mingfa ZHU, Limin XIAO, Li RUAN

Author information

Keywords: sparse A^tAx, compressed sparse block, compressed sparse rows, multicore platform

JOURNAL FREE ACCESS

2014 Volume E97.D Issue 2 Pages 315-318

DOI https://doi.org/10.1587/transinf.E97.D.315

Details

Abstract

The sparse matrix operation, y ← y+A^tAx, where A is a sparse matrix and x and y are dense vectors, is a widely used computing pattern in High Performance Computing (HPC) applications. The pattern poses challenge to efficient solutions because both a matrix and its transposed version are involved. An efficient sparse matrix format, Compressed Sparse Blocks (CSB), has been proposed to provide nearly the same performance for both Ax and A^tx. We develop a multithreaded implementation for the CSB format and apply it to solve y ← y+A^tAx. Experiments show that our technique outperforms the Compressed Sparse Row (CSR) based solution in POSKI by up to 2.5 fold on over 70% of benchmarking matrices.

Corresponding author

Register with J-STAGE for free!