Article ID: 22.20250409
Deep neural networks (DNNs) have achieved remarkable success in critical domains such as computer vision. However, their substantial model scale and computational demands hinder deployment on resource-constrained edge devices. Bit-serial accelerators (BSAs) leverage significant bit-level sparsity (BLS) in weights and activations to accelerate inference. Unstructured BLS causes hardware inefficiency, while existing static pruning methods cannot adapt to real-time activations. To address these challenges, we propose BitFleX, a BSA enabling runtime semi-structured pruning for both weights and activations. Specifically, we introduce Bit-Term Decomposition (BTD) encoding to enhance inherent BLS and reduce pruning complexity. Additionally, a pruning-error predictor dynamically selects operands for sparsification with minimal error. Experiments show BitFleX achieves 87.5% BLS in ViT-B with <1% Top-1 accuracy loss on ImageNet, yielding 5.86× speedup over baseline and 23.61 TOPS/W peak energy efficiency.