IEICE Transactions on Electronics
Online ISSN : 1745-1353
Print ISSN : 0916-8524

This article has now been updated. Please use the final version.

Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization
Koichi SHIRAHATAAmir HADERBACHENaoto FUKUMOTOKohta NAKASHIMA
Author information
JOURNAL RESTRICTED ACCESS Advance online publication

Article ID: 2020LHS0001

Details
Abstract

Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique avoids slowdown by excluding the slow processes by 12.5% to 50% without accuracy loss.

Content from these authors
© 2020 The Institute of Electronics, Information and Communication Engineers
feedback
Top