Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Hybrid CPU-GPU Implementation of Hierarchical Matrix Generation Using a Backtracking-based Load Balancing Framework
Jing XuTasuku HiraishiZhengyang BaiAkihiro IdaMasahiro YasugiKeiichiro Fukazawa
Author information
JOURNAL FREE ACCESS

2026 Volume 34 Pages 429-443

Details
Abstract

Hierarchical matrices (H-matrices) are critical for large-scale numerical computations, such as boundary element methods and covariance statistics. Their efficient generation is computationally intensive and highly irregular, making effective parallelization on heterogeneous systems challenging. Although GPU acceleration can improve performance, GPU efficiency is significantly degraded when processing numerous small submatrices typical in H-matrix generation, where insufficient parallelism combined with highly variable per-submatrix workloads fails to utilize GPUs effectively. We present a hybrid CPU-GPU implementation of H-matrix generation built upon Tascell, a backtracking-based load balancing framework originally designed for CPU execution. Our implementation applies Tascell's dynamic load balancing mechanism throughout the entire process, constructing the hierarchical tree structure of H-matrices and performing matrix block computations for leaf nodes. We dynamically determine whether to use CPU or GPU for each matrix block computation based on block size, type, and GPU availability. To reduce kernel launch and data transfer overheads, we apply a threshold-based offloading strategy. We use OpenACC conditional directives to maintain a single unified code implementation for both CPU and GPU execution, avoiding code duplication. Experimental results show that GPU-only execution suffers significant performance degradation, while our hybrid implementation generally outperforms CPU-only execution. The improvement is greater when fewer CPU workers are available: up to 1.46-fold speedup with 4 workers compared to 1.15-fold with 64 workers. Performance is largely insensitive to threshold settings, demonstrating robustness. Overall, this hybrid CPU-GPU approach improves the efficiency and scalability of H-matrix generation and provides a practical solution for accelerating highly irregular computations on heterogeneous systems.

Content from these authors
© 2026 by the Information Processing Society of Japan
Previous article Next article
feedback
Top