Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 1Win4-67
Conference information

Event-Driven GPUDirect Inference for Reducing Overhead in Inference Serving
*Kenji kenji TANAKAKento KITAMURAKazunori SENO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In this study, we developed a novel event-driven streaming GPU computing system that integrates DOCA GPUNetIO and CUDA Graph to support an AI-driven cyber-physical system operating on NTT’s next-generation data center infrastructure (IOWN). The goal is to enable concurrent execution of multiple models while minimizing latency overhead and GPU power consumption. Compared to existing methods, the proposed approach reduces inference overhead by 20% and increases throughput by 173.2%. Furthermore, by employing event-driven inference, our system can process inference requests for up to five models simultaneously without resource contention.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top