IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524
Special Section on Low-Power and High-Speed Chips
Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization
Koichi SHIRAHATAAmir HADERBACHENaoto FUKUMOTOKohta NAKASHIMA
Author information
JOURNAL RESTRICTED ACCESS

2021 Volume E104.C Issue 6 Pages 257-260

Details
Abstract

Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

Content from these authors
© 2021 The Institute of Electronics, Information and Communication Engineers
Previous article
feedback
Top