2021 年 34 巻 9 号 p. 235-242
In this paper, we propose a method of designing input-output history feedback controllers for unknown linear discrete-time systems. Many conventional reinforcement-learning based controls such as policy iteration are state-feedback. We extend the policy iteration by incorporating a method to statically estimate state variables from a history of finite-time input-output data. The convergence of the policy to model-based optimal solution has been theoretically guaranteed. Moreover, the proposed method is one-shot learning, i.e., the optimal controller can be obtained by using initial experiment data only. The effectiveness of the proposed method is shown through a numerical simulation through an oscillator network.