2003 Volume 39 Issue 6 Pages 590-599
Dynamic power management (DPM) is one of the most effective techniques for reducing energy consumption. Especially, auto-sleep function is the simplest but effective way to reduce electric power consumption. In this paper, we consider a stochastic model to determine an auto-sleep timing sequentially. More precisely, we develop an optimal control scheme of auto-sleep timing based on the Q-learning, which is a part of reinforcement learning algorithms and is strictly related to the Markov decision process (MDP) and the semi-Markov decision process (SMDP). First, we reformulate the stochastic auto-sleep model under the SMDP. Second, the optimal control scheme is to determine the optimal auto-sleep timing established by applying the Q-learning algorithm. Finaly, numerical examples are presented to investigate the effectiveness of DPM with real data.