2023 Volume 12 Issue 8 Pages 432-437
Image inference using deep learning (DL) has recently become popular in a variety of applications. In the near future, real-time inference for delay sensitive images will be widespread with the edge computing paradigm, where DL processing will be executed at the edge of the network. In this study, a DL inference method in near future edge computing is proposed to infer a large number of images, which are aggregated from various end devices toward an edge server and are temporarily stored in a queue, with different requirements on inference time. With the proposed method, the inference time including waiting time in a queue for a delay sensitive image is guaranteed and that for a delay tolerant image is minimized by adaptively partitioning the DL processing executed for delay tolerant images between an edge and a cloud server depending on the arrival conditions of images at an edge server. The effectiveness of the proposed method is evaluated via simulation.