自己組織化マップを用いた定常・非定常環境に適応可能な多腕バンディットアルゴリズム

馬目 信人; 篠原 修二; 鈴木 康大; 朝長 康介; 光吉 俊二

doi:10.11517/pjsai.JSAI2019.0_3Rin207

33rd (2019)

Session ID : 3Rin2-07

DOI https://doi.org/10.11517/pjsai.JSAI2019.0_3Rin207

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 2019

Number : 33

Location : [in Japanese]

Date : June 04, 2019 - June 07, 2019

Multi-armed bandit algorithm applicable to stationary and non-stationary environment using self-organizing maps

*Nobuhito MANOME, Shuji SHINOHARA, Kouta SUZUKI, Kosuke TOMONAGA, Shunji MITSUYOSHI

Author information

Keywords: Multi-armed bandit problem, Self-organizing maps

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

A communication robots aiming to satisfy the users facing them needs to take appropriate behavior more rapidly. However, user requests often change while these robots are determining the most appropriate behavior for these users. Therefore, it is difficult for robots to derive an appropriate behavior. Such problems are formulated as a multi-armed bandit problem. To solve this problem, we proposed a multi-armed bandit algorithm capable of adaptation to stationary and non-stationary environments using self-organizing map. In this study, numerous experiments were conducted considering a stochastic multi-armed bandit problem in both stationary and non-stationary environments. Consequently, the proposed algorithm demonstrated equivalent or improved effectiveness in stationary environments with numerous arms and consistently strong capability in non-stationary environments regardless of the number of arms in contrast with existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

Corresponding author

Conference information

Register with J-STAGE for free!