IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM
Yibo FANLeilei HUANGKewei ChenXiaoyang ZENG
著者情報
ジャーナル 認証あり 早期公開

論文ID: 2019ECP5008

この記事には本公開記事があります。
詳細
抄録

The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.

著者関連情報
© 2019 The Institute of Electronics, Information and Communication Engineers
feedback
Top