Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems

Junya Ikemoto; Toshimitsu Ushio

doi:10.1587/nolta.12.738

Abstract

A simulator that predicts the behavior of a real system is useful for reinforcement learning (RL) because we can collect experiences more efficiently than through interactions with the real system. However, in the case where there is an identification error, the experiences obtained by the simulator may degrade the performance of the learned policy for the real system. Thus, we propose a two-stage practical RL algorithm using a simulator. In the first stage, we prepared multiple premised systems in the simulator and obtained approximated optimal Q-functions for these systems. In the second stage, we represent a Q-function for the real system as an approximated linear function whose basis functions are the approximated optimal Q-functions pre-trained in the simulator. The approximated linear Q-function is learned through interactions with a real system.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!