In this paper, we present a parallel algorithm for enumerating joint weight of a binary linear $(n,k)$ code, aiming at accelerating assessment of its decoding error probability for network coding. Our algorithm is implemented on a multi-core CPU system and an NVIDIA graphics processing unit (GPU) system using OpenMP and compute unified device architecture (CUDA), respectively. To reduce the number of pairs of codewords to be investigated, our parallel algorithm reduces dimension k by focusing on the all-one vector included in many practical codes. We also employ a population count instruction to compute joint weight of codewords with a less number of instructions. Furthermore, an efficient atomic vote and reduce scheme is deployed in our GPU-based implementation. We apply our CPU- and GPU-based implementations to a subcode of a (127,22) BCH code to evaluate the impact of acceleration.
2015 International Journal of Networking and Computing