This paper proposes a new framework for designing a feedback control system, called the data space approach, in which a set of experimental input and output data of a dynamical system is directly and solely used to design a stabilizing controller, without employing any, mathematical model such as a transfer function, state equation, or kernel representation. In this approach, the notions of open-loop and closed-loop data spaces are introduced, which respectively contain all the open-loop and closed-loop behaviors of a dynamical system and serve as the data space representations of the system dynamics. By using the geometrical relationship between the open-loop and closed-loop data spaces, a data-based stabilizability condition is derived as a nonlinear matrix inequality (NMI), and the parameters of a stabilizing controller is obtained as an orthogonal vector to its solution. To compute the solution, the NMI is converted to a linear matrix inequality with a rank constraint and a computational algorithm using LMI relaxation and linear approximation methods is applied.
An effective method of acquiring a complex control policy is requested concerning real systems and real robots in recent years. There are a lot of researches using the reinforcement learning, because the reinforcement learning is an important element technology. In the reinforcement learning, a scalar evaluation of control that is called a reward is set to obtain a desirable behavior. However, the reward is often given as the vector at a complex system control problem. For this case, when the reinforcement learning applies, the method of making the rewards a scalar by the linearly weighted sum, etc. has been adopted. In this paper, we explain that such scalar method is not appropriate. We adopt a framework of multi-criteria reinforcement learning in the handling of the vector of the rewards and the related value functions. In this case, we cannot use the action selection strategy like the ε-greedy strategy adopted in general. Therefore, we show the necessity and importance of the decision-making strategy in the multi-criteria reinforcement learning. We propose the decision-making strategy of selecting effective action candidates by the α-domination strategy and using goal-directed bias based on the achievement level of each evaluation. We apply the proposed method to the walking control problem of the humanoid robot. The physical simulation results show that our method can improve the walking control efficiently.
In this paper, we propose a new approach to the malfunction diagnosis problem for feedback systems based on the controller information. We define a fault as any loss of stability of the closed loop system, and a malfunction as a variation from the nominal situation that will develop into the fault, respectively. We supervise and evaluate the variation periodically, and avoid the occurrence of those faults in which gradual changes of plant dynamics result. First, in the supervision, the plant variation is obtained by using a plant model estimated by the closed loop identification scheme proposed by Dasgupta et. al. . Second, the evaluation is achieved by comparing the stability margin of the initial feedback system with the plant variation after the occurrence of any malfunction from the view point of stability of the closed loop system. The gap metric, which is closely related to the stability, is used in order to compare the above two values, and the model variation is captured in terms of the gap metric. However, the exact plant model is unknown, so its upper bound is estimated by using the set membership identification [3, 12, 19, 20] in the Dasgupta's framework. This comparison makes it possible to quantify the reliability of the feedback system.
Memory-based estimation is a method to locally estimate a nonlinear function by using massive data stored in a database. It generally consumes a considerable amount of time on arithmetic processing. To make the memory-based estimation method more practical, it is necessary to reduce the computation time. In this paper, we improve memory-based estimation by distributively executing the estimation algorithm.