Host: The Japanese Society for Artificial Intelligence
Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019
Number : 33
Location : [in Japanese]
Date : June 04, 2019 - June 07, 2019
A communication robots aiming to satisfy the users facing them needs to take appropriate behavior more rapidly. However, user requests often change while these robots are determining the most appropriate behavior for these users. Therefore, it is difficult for robots to derive an appropriate behavior. Such problems are formulated as a multi-armed bandit problem. To solve this problem, we proposed a multi-armed bandit algorithm capable of adaptation to stationary and non-stationary environments using self-organizing map. In this study, numerous experiments were conducted considering a stochastic multi-armed bandit problem in both stationary and non-stationary environments. Consequently, the proposed algorithm demonstrated equivalent or improved effectiveness in stationary environments with numerous arms and consistently strong capability in non-stationary environments regardless of the number of arms in contrast with existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.