Abstract
This paper compares the performance of sparse Matrix-vector multiplication paralleled by the conventional Block-Cyclic distribution and its improved variant on parallel computer with shared memory. The underlying idea is to exchange nonzero entries of matrix assigned to each thread with block unit. Numerical results demonstrate that the proposed distribution using exchange nonzero entries of matrix with block unit gives or improves parallelism.