推論サービングのオーバヘッドを削減するイベント駆動型GPUDirect推論

田仲 顕至; 北村 研人; 妹尾 和則

doi:10.11517/pjsai.JSAI2025.0_1Win467

Abstract

In this study, we developed a novel event-driven streaming GPU computing system that integrates DOCA GPUNetIO and CUDA Graph to support an AI-driven cyber-physical system operating on NTT’s next-generation data center infrastructure (IOWN). The goal is to enable concurrent execution of multiple models while minimizing latency overhead and GPU power consumption. Compared to existing methods, the proposed approach reduces inference overhead by 20% and increases throughput by 173.2%. Furthermore, by employing event-driven inference, our system can process inference requests for up to five models simultaneously without resource contention.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!