IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524
Special Section on Low-Power and High-Speed Chips
A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU
Kentaro KAWAKAMIKouji KURIHARAMasafumi YAMAZAKITakumi HONDANaoto FUKUMOTO
著者情報
ジャーナル フリー

2022 年 E105.C 巻 6 号 p. 222-231

詳細
抄録

To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

著者関連情報
© 2022 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top