Stochastic Shortest Path Problem on Borel Space Considering Dead-ends and Undesired Terminal States

Ritsusamuel Otsubo

doi:10.5687/sss.2024.10

Abstract

The stochastic shortest path problem (SSP) is a standard model for sequential decision making under uncertain environments. This model cannot handle cases where a catastrophic event occurs in the middle of an episode, or situations where a transition to a terminal state may not occur. Therefore, we define an SSP that takes into account dead-ends and undesired terminal states. The optimal policy for this problem is a stochastic semi-Markov policy. Since it is difficult to solve this SSP directly, an approximate problem is proposed. The optimal policy for an approximation problem is expressed as a probability distribution on a set of at most three deterministic policies. Deterministic policies are derived by considering a Bayesian-Adaptive MDP (BAMDP) for three Markov decision processes (MDPs) corresponding to the objective function and constraints regarding dead-ends and undesired terminal states. The probability distribution on a set of deterministic policies can be obtained by solving a two-person zero-sum game problem between the deterministic policies and three MDPs.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!