ニューロ・ダイナミックプログラミングによる生産ラインの最適制御に関する研究

大野 勝久; 八嶋 憲司; 伊藤 崇博

doi:10.11221/jima.54.316

Abstract

This paper discusses an optimal control problem for a failure-prone multi-stage production line that minimizes expected total cost per unit time. The problem is formulated as an undiscounted Markov decision process (UMDP), but it is difficult to solve the problem exactly due to the curse of dimensionality. In recent years, however, several algorithms in the field of reinforcement learning, or neuro-dynamic programming (NDP), have been devised to overcome the curse of dimensionality. In this paper, the simulation-based modified policy iteration method (SBMPIM) is proposed as a new algorithm of NDP. The SBMPIM adopts simulation in the value approximation routine of the modified policy iteration method (MPIM), which is an algorithm of UMDP applicable to relatively large-scale problems. The SBMPIM and existing algorithms, such as SMART, RELAXED-SMART and SBPI, of NDP are numerically compared with an optimal control policy of a single-stage production line computed by MPIM. Moreover, the performance of JIT production lines with the optimal numbers of kanbans is numerically compared with that of optimal or near-optimal control policies computed by SBMPIM. It is shown that all algorithms except the SBMPIM of NDP fail to converge, and the expected total cost per unit time of JIT production lines with the optimal numbers of kanbans rises by more than 7 percent in comparison with the expected total cost per unit time computed by SBMPIM.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!