Abstract
Reinforcement learning is an essential class of machine learning for autonomous agents to acquire adaptive and reactive behaviors. Q-Learning is a widely-used reinforcement learning method which deals with only discretevalued inputs (states) and outputs (actions). In this paper, we propose a new method of Q-Learning where fuzzy inference is introduced to represent Q-function (action value function) that evaluates state/action pairs, in order to to deal with continuous-valued inputs and outputs. Within this method, the steepest descent method is utilized to update Q-function so that the parameters of fuzzy rules are tuned both in antecedent part and consequent part. Furthermore, we extend this method to the case of discrete actions by preparing a fuzzy inference system for each action in order to realize the speed up of learning. Finally we show the effectiveness of this method by comparing it with other methods through the applications to control problems such as the cart-pole balancing problem and the ship navigation problem.