Abstract
In this paper, we propose a target selection-type Q-Learning method with plural Q-values concerning the maximization and minimization of rewards and punishments. We aim at the realization of a system to obtain complicated emotion behaviors selected based on the positive and negative evaluation according to the situation. Furthermore, we also report the results of an experiment by computer simulation and Kansei evaluation to confirm the efficiency of proposed method. A new simulation to show the effect of an internal state is also introduced at the end of this paper.