2025 Volume 16 Issue 3 Pages 422-443
DNN accelerators, which can efficiently perform computations on multiple models, are recently in demand. In this study, we proposed an architecture that efficiently performs computations by switching the computation method according to the model to be computed, achieving switching in parallelism with no data movement between the memories. Compared to other architectures, this architecture improved the PE utilization by up to 14% on existing models. In addition, as parallelism can be switched, higher PE utilization was achieved with various types of DNN layers, where the PEs are expected to serve as generic architectural primitives, even for future DNN model structures.