1996 年 11 巻 3 号 p. 411-419
The capability of learning is one of the salient features of realtime search algorithms such as LRTA. These algorithms repeatedly perform problem solving trials so that the heuristic values will eventually converge to exact values along every optimal path to the goal. The major impediment is, however, the instability of the solution quality (the length of the solution path) during convergence. This instability is due to two properties of the search algorithms:(1) they try to find all optimal solutions even after obtaining fairly good solutions, and (2) they tend to move towards unexplored areas thus failing to balance exploration and exploitation. In this paper, we propose and analyze two new realtime search algorithms to stabilize the convergence process. ・ε-search (weighted realtime search) relaxes the condition of searching for optimal solutions to allow suboptimal solutions with ε error. As a result, ε-search significantly reduces the total amount of learning performed. ・ε-search (realtime search with upper bounds) utilizes the upper bounds of estimated costs, which become available after the problem is solved once. Guided by the upper bounds, δ-search can better control the tradeoff between exploration and exploitation. The ε-and δ-search algorithms can be combined easily. The effectiveness of these algorithms is demonstrated by solving randomly generated mazes.