IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU
Kentaro KAWAKAMIKouji KURIHARAMasafumi YAMAZAKITakumi HONDANaoto FUKUMOTO
著者情報
ジャーナル フリー 早期公開

論文ID: 2021LHP0001

この記事には本公開記事があります。
詳細
抄録

To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

著者関連情報
© 2021 The Institute of Electronics, Information and Communication Engineers
feedback
Top