2025 Volume E108.D Issue 5 Pages 436-439
In this letter, we propose Asymmetric Padded Winograd called APW, designed to enhance the computational efficiency of Winograd-based convolution algorithms on SIMT architectures. This approach resolves thread divergence, which typically causes delays in execution due to uneven computational distribution across threads. By integrating asymmetric padding into both filters and inputs, APW unifies the size of sub-filters and sub-inputs. This uniformity maintains a consistent execution path for threads throughout Winograd-based convolution process, effectively minimizing thread divergence. Our experimental results demonstrate that APW substantially reduces thread divergence observed in previous work to nearly zero and cuts down the total execution time by up to 17.78%.