IPSJ Transactions on System and LSI Design Methodology

Show abstractHide abstract

Over the past few years, attention is shining in the field of deep learning, especially in the domain of natural language processing (NLP). Its impressive effectiveness, along with ubiquitous implementations, have aroused our interest in efficiently scheduling the data-flow of corresponding computations onto architectures with many computing units to realize parallel computing. In this paper, based on manually analyzing the optimum scheduling solutions for small instances, which are obtained by a satisfiability checking (SAT) solver, we propose a general scheduling solution to parallelize the processing of attention layers that are widely adopted in recent deep learning models. According to the solution, for the proposed hardware system with m processing elements (PEs) connected in a unidirectional ring, a m-time speed up is achievable. For two specific application schemes of attention, we respectively recognize that almost 25% and 50% of the original computations have become redundant under those certain circumstances. To avoid unnecessary computing with corresponding gains in processing latency, we have come up with strategies of optimization accordingly, which further lead to another two scheduling solutions. By avoiding the redundancy, the adoptions of the optimized scheduling solutions are able to additionally bring near 25% and 50% reduction in execution cycles, respectively for the two application schemes. To prove the correctness of these solutions, we have mathematically revealed their validity, as well as utilized SAT solver to conduct the verification by adopting the solutions themselves as additional constraints for the formulated SAT problems.

View full abstract

Download PDF (3752K)

Posit is a numerical representation that is especially focused on deep neural networks (DNNs). However, a specific register called quire is necessary for a posit multiply-accumulate (MAC) unit to ensure the calculation accuracy. In this paper, we proposed posit based MAC unit that can optimize quire size according to target applications. We also develop the DNN library to explore quire size of proposed MAC unit. Experimental result with ResNet-9 showed that we achieved the same level of accuracy as Deep Positron but with the area reduction of 43%.

View full abstract

Register with J-STAGE for free!