Recently, Wang and Ray introduced a signed measure for formal languages, called a language measure, to evaluate performance of strings generated by discrete event systems and a synthesis method of an optimal supervisor based on the language measure has been studied. In order to apply the method, exact information about a language measure of a controlled discrete event system is needed. From the practical point of view, it is not always possible to know the information a priori.
In such a situation, a learning-based approach is useful to obtain an optimal supervisor with respect to the language measure. This paper considers a synthesis method of an optimal supervisor with respect to a language measure. First, we clarify the relationship between the Bellman equation in reinforcement learning and performance of the language generated by the controlled discrete event systems. Next, using the relationship, we propose a learning method of the optimal supervisor where costs of disabling events are taken into consideration. Finally, by computer simulation, we illustrate an efficiency of the proposed method.
抄録全体を表示