As a computer technology has remarkably developed, a number of processors have been employed to control the function of a system. This paper considers a system whose control mechanisms are realized by multi-communication between several processors, and studies the problem for improving its reliability : When either processor failures or communication errors have occurred, the operation of rollback for processors associated with such events is executed to just before the checkpoint, and so that, the consistent state in the whole system is maintained by its recovery. The expected cost is derived and the optimal checkpoint interval to minimize it is analytically discussed.
View full abstract