In this paper, we consider profit sharing that is one of the reinforcement learning methods. An agent learns a candidate solution of a problem from the reward that is received from the environment if and only if it reaches the destination state. A function that distributes the received reward to each action of the candidate solution is called the reinforcement function. On this learning system, the agent can reinforce the set of selected actions when it gets the reward. And the agent should not reinforce the detour actions. First, we will propose a new constraint equation about reinforcement functions to distribute the reinforcement values on the non-detour actions. If we use the reinforcement function to satisfy the constraint equation, the agent can select the non-detour actions directing to the destination state. Next, it is shown that the reinforcement function can be constant after learning process to suppress the selection of detour actions. Lastly, in computer simulations for maze problems, we show that the learning performance of agents does not depend on the size of environment.
2004 JSAI (The Japanese Society for Artificial Intelligence)