1998 Volume 13 Issue 4 Pages 631-638
The real-time algorithm LRTA enjoys an attractive property called convergence ; through the repeated problem solving trials, the problem solver will eventually identify ( or learn) an optimal path to the nearest goal. In his original LRTA paper, Korf presented a proof of convergence, but only on the assumption that the initial heuristic estimates satisfy consistency ; it was not made clear whether the convergence is retained under inconsistent heuristics, though the extension of his proof to this case is nontrivial. In this article, we establish the convergence of LRTA by a novel technique that does not rest on the consistency assumption at all. Since it is a natural extension of the proof of the completeness, it constitutes the connection between these fundamental properties of LRTA that have been discussed somewhat independently. We also compare our technique with that of Barto et al. who proved general convergence by restating LRTA as an instance of asynchronous dynamic programming.