Abstract
The trade-off of exploration and exploitation is present for a learnig method based on the trial and error such as reinforcement learning. We have proposed a reinforcement learning algorism using reward and punishment as repulsive evaluation (2D-RL) . In the algorithm, an appropriate balance between exploration and exploitation can be attained by using interest and utility. In this paper, we applied the 2D-RL to a navigation learning task of mobile robot, and the robot found a better path in real world by 2D-RL than by traditional actor-critic model.