IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524
Special Section on Low-Power and High-Speed Chips
Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization
Koichi SHIRAHATAAmir HADERBACHENaoto FUKUMOTOKohta NAKASHIMA
著者情報
ジャーナル 認証あり

2021 年 E104.C 巻 6 号 p. 257-260

詳細
抄録

Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

著者関連情報
© 2021 The Institute of Electronics, Information and Communication Engineers
前の記事
feedback
Top