Transactions of the Society of Instrument and Control Engineers
Online ISSN : 1883-8189
Print ISSN : 0453-4654
ISSN-L : 0453-4654
An Actor-Critic Algorithm Using a Binary Tree Action Selector
Reinforcement Learning to Cope with Enormous Actions
Hajime KIMURAShigenobu KOBAYASHI
Author information
JOURNAL FREE ACCESS

2001 Volume 37 Issue 12 Pages 1147-1155

Details
Abstract
In real world applications, learning algorithms often have to handle several dozens of actions, which have some distance metrics. Epsilon-greedy or Boltzmann distribution exploration strategies, which have been applied for Q-learning or SARSA, are very popular, simple and effective in the problems that have a few actions, however, the efficiency would decrease when the number of actions is increased. We propose a policy function representation that consists of a stochastic binary decision tree, and we apply it to an actor-critic algorithm for the problems that have enormous similar actions. Simulation results show the increase of the actions does not affect learning curves of the proposed method at all.
Content from these authors
© The Society of Instrument and Control Engineers (SICE)
Previous article Next article
feedback
Top