Article ID: 2022XBL0100
Designing an ICT system providing certain network application services (DICTS), which consists of selecting appropriate equipment for various requirements, optimal arrangements, and correct connections, needs specialized knowledge and enormous human power. Autonomous DICTS technology using deep reinforcement learning (DRL) has an elementary problem of huge learning time caused by accidentally overestimating a specific configuration because of the sparse reward despite a vast combination of selections, arrangements, and connections. This paper applies our improved Double DQN, a typical DRL algorithm to suppress overestimation, to an autonomous DICTS technology named Weaver and demonstrates the possibility of reducing the number of episodes until convergence by 25%.