Abstract
In this paper, we propose a new reinforcement learning algorithm with a variable bias between two animalistic instincts-the activeness and the cautiousness. Although reinforcement learning methods are more flexible than supervised learning methods, it is difficult to determine the appropriate amount of reinforcement signals. For example, too much reward for the second best solutions can prevent the agent from exploring for the best solution, while insufficient amount of reward can prevent it from learning any kind of solutions. To overcome such problems, the proposed model uses two learning modules with a variable bias. One of the modules represents the activeness. Its sole goal is to maximize the amount of reward given from the environment. The other module represents the cautiousness. Its goal is to minimize the amount of penalty. By changing the bias between these two modules, the proposed model can perform efficient learning in wider variety of environments. With computer simulations, we confirmed that the proposed model can learn effectively in environments where conventional models show poor performance. In other environments, the proposed model showed performances comparable to conventional models.