IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Adaptive Tile Pruning for Efficient Inference of the Imbalanced DNNs on GPU
Yanchen LIFumihiko INO
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDP7220

詳細
抄録

Deep neural network (DNN) pruning is a popular method for accelerating computations in DNNs by removing unimportant parameters. Among pruning methods, tile-wise pruning (TWP) achieves significant acceleration with minimal pruning loss. However, TWP suffers from load imbalance when important weight elements in the matrices of the DNN are unevenly distributed. To address this issue, we propose adaptive tile pruning (ATP), an integrative solver for building sparse DNNs with controllably balanced workloads. ATP comprises three components: hierarchical tile pruning (HTP), split-tiled sparse matrix multiplication (STSpMM), and adaptive pattern selection (APS). HTP constructs sparse matrices with evenly distributable workloads while preserving DNN model accuracy. STSpMM efficiently handles HTP-generated sparse matrices on GPUs by splitting and redistributing large workloads. APS dynamically selects pruning patterns for HTP and grid sizes for STSpMM based on the problem sizes in the targeted DNN. We evaluated our approach on pruned ResNet-18 and ResNet-34 models using ImageNet, and BERT-Small on the question-answering natural language inference (QNLI) task. Results demonstrate that models accelerated by ATP achieve greater acceleration than previous methods while maintaining accuracy for inference.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top