Note on Complete Proof of Axelrod’s Theorem

: This note will give a complete proof of Axelrod’s theorem that characterizes the advantage of Tit-for-Tat (TFT) strategy in the repeated prisoner’s dilemma. Despite of its importance in Axelrod’s study, the proof of the theorem is incomplete. First, the fault of the proof is depicted and two approaches for complementation are shown. Then, we provide the complete proof using these two approaches.

In the repeated prisoner's dilemma game, the advantage of Tit-for-Tat (TFT) strategy is characterized by the following theorem (Axelrod, 1981, Theorem 2, see also Axelrod & Hamilton, 1981): Axelrod's Theorem. For any strategy A, where V(A|B) is the payoff which strategy A can get when playing with strategy B; w is the discount factor; T, R, P, and S are the payoffs of the prisoner's dilemma game and satisfy T>R>P>S and 2R>T+S (see Table 1).
This theorem provides the condition under which any strategy cannot get higher payoff than TFT. It is a very important theorem in Axelrod's study and was reprinted in his famous book (Axelrod, 1984, Appendix B). In the last two decades, scholars have been arguing the theorem and Axelrod's "collective stability" concept (see Bendor & Swistak, 1997, as a recent example). But the sufficiency of the theorem is not proved completely by Axelrod (1981Axelrod ( , 1984 as Taylor (1987) suggested.
In this note, we will give a complete proof of Axelrod's theorem. We start with a brief discussion about the incompleteness of Axelrod's proof, then will show that there are two approaches to complete the proof of sufficiency.

Proof:
(I) First we prove that the alternative formulation mentioned above is equal to the condition of Thus, these two formulations are equivalent.
(II) From (I), we can paraphrase the theorem as follows: The necessity of this is trivial. If any strategy cannot get higher payoff than TFT, then neither ALL D nor DCDC can get higher payoff than TFT.
We therefore turn to sufficiency. TFT has only two states depending on what the other player did on the previous move. We call them state 1 and state 2.
When the other player (hereafter player 2) chose C on the previous move, TFT is in state 1 and chooses C on the next move. On the contrary, when player 2 chose D on the previous move, TFT is in state 2 and chooses D. We can consider that TFT is in state 1 on the first move. Therefore, we can conclude that the proof of sufficiency by Axelrod (1981) is incomplete.
This criticism on Axelrod (1981), however, might not be fair. Looking at Axelrod and Hamilton (1981), the proof which must have been in his mind might be similar to the proof we introduce in the next section: the proof using the concept of subgame.
However, the proof suggested in Axelrod and Hamilton (1981) is also insufficient. In the next section, we will consider this type of proof and provide the complete proof.
Then we will provide another approach to prove the theorem. Here we will prove that any strategy which defects on any move (whether it acts dependently on TFT's state or not) cannot get higher payoff than TFT.

Proof 1: Use of Subgame
Let us now consider the former approach to prove Axelrod's theorem. In order to do so, we use the concept of subgame. This type of proof was already suggested by Axelrod and Hamilton (1981) and Maynard Smith (1982, Appendix K), and then provided by Taylor (1987). But all of these proofs are also insufficient. They only showed section (I) of the following proof and did not examine the case which we will discuss in section (II). We should complement this insufficiency.
In this note, we define subgame as a part of the game beginning from the nth move for any n1. By definition, the game itself is a subgame. Moreover, any subgame in the repeated prisoner's dilemma is the same as the whole game. Hereafter, the subgame beginning from the nth move is denoted by g n .
We provide a proof of sufficiency by using the concept of subgame.

Proof:
(I) From the player 2's (the player who plays game with TFT player) point of view, all subgames can be divided into two types; (1) subgames beginning with state 1, or (2) those beginning with state 2. Type 1 includes the game itself since TFT is in state 1 on the first move.
Let us now imagine the best strategy for each type of subgame. We make another lemma about the best strategies (instead of lemma 1).
Lemma 2: There are one or more best strategies against TFT for each type of subgame.
Contrary to Lemma 1, this lemma is trivial.
Since the choice of TFT is dependent solely on the choice of the strategy which is interacting with TFT, the payoff of this strategy is fixed. Thus there must be the best strategies for each subgame. First, we consider the case where there is one best strategy for each type.
(i) If S 1 , the best strategy for type 1, that is, the best strategy for the game, chooses C on the first move, then the second move of TFT is C and g 2 is also type 1. Thus the second move of S 1 must be C.
Similarly, S 1 must be the repeated sequence of C (CCCCCC…).
(ii) On the contrary, if S 1 choose D on the first move, then g 2 is type 2. Here we have to consider two cases; the case in which S 2 , the best strategy for type 2, chooses C on its first move, or the case in which S 2 chooses D on its first move.
In the former case, S 1 chooses D at first and then chooses C, since the best strategy for g 2 begins with C. Thus, g 3 is type 1 and the third move must be D. Similarly, S 1 must be the repeated sequence of DC (DCDCDC…).
In the latter case, S 1 chooses D at first and then chooses D. Thus g 2 is also type 2 and the third move must be D. Similarly, S 1 must be the repeated sequence of D (DDDDDD…).

43
beginning with C and beginning with D.
For the S 1 beginning with C, g 2 is type 1. It follows from this that if S 1 continues choosing C until the nth move, then g n+1 is equal to the game as a whole. Hence, in the case above, we can deal with the choice of D on the n+1th move similarly to the choice of D on the first move. It also follows that the repeated sequence of C is included in the set of S 1 , similar to (I).
For the S 1 beginning with D, g 2 is type 2. If the S 2 chooses D at first, then g 3 is also type 2, and thus g n is also type 2 for all n3. It indicates that S 1 beginning with D is the repeated sequence of D. This implies that the payoff of CCCCCC… and that of DDDDDD… are the same. This is satisfied when

w=(T-R)/(T-P).
It must be noted that any strategies which change its choice from the repetition of C to that of D on any move is also included in the set of On the contrary, if S 2 chooses C at first, then the second move must be C and g 3 returns to type 1.
This implies the payoff of DCCCCC… and that of CCCCCC… are the same. This is satisfied when

w=(T-R)/(R-S).
In this case, any strategy which chooses D on any move and chooses C on the next move is also the best strategy (see also (I) of the next section).
(ii) Now, we consider the case in which there are several S 1 and S 2 . By using similar methods of (i), we can conclude that all of the sequence CCCCCC…, DCCCCC…, and DDDDDD… are the best strategy and the payoff of them are equal. This condition is satisfied when

w=(T-R)/(T-P)=(T-R)/(R-S).
It is noteworthy that defection on any move cannot change payoff, so that any strategy is the best strategy against TFT (see also (II) of the next section). Thus the proof is completed.
Proof 2: Consideration of All Type

Defection
In the previous section we provide the complete proof of Axelrod's theorem by using the concept of subgame. In this section we take another approach and prove that any strategies, which defect on any move cannot get higher payoff than TFT. Though this approach is the same in its essentials as the proof that the pair of TFT is Nash equilibrium, our proof is related to the proof by Axelrod or the proof in the previous section. The former version of this proof is presented in Shimizu (1997).

Proof:
(I) The games of TFT versus TFT have the sequence of moves indicated in Table 2. We consider a strategy D 1 (k) of Player 2 which has the sequence of moves in Table 3.
The difference between Table 2 and Table 3 is only the kth and (k+1)th moves within the boxes.
Then we define U 2 (k) as the total payoff except for the kth and (k+1)th moves. We obtain ).
Therefore, for any k, k=1,2,3,..., We can see from this that one-time defection on any move cannot change the payoff of player 2 if w=(T-R)/(R-S). This is the case we mentioned in (II) of the previous section.
(II) Now, we consider strategies D n (k) and D n+1 (k) of Player 2 that are respectively defined as Tables 4 and 5. The case of n=1 has been considered in (I), then it is assumed that n2.
The difference between Table 4 and Table 5 is only the (k+n)th and (k+n+1)th moves within the boxes. Then we define U 2 (k+n) as the total payoff except for the (k+n)th and (k+n+1)th moves. We Therefore, for any k, k=1,2,3,..., for any k.
Therefore, player 2 cannot increase his or her payoff by increasing the number of defection. Thus all we have to do is compare V(TFT|TFT) with V(D 1 (k)|TFT).
From (I) we obtain, . and  16) and thus the comparison between V(TFT|TFT) and V(D 1 (k)|TFT) is equivalent to that between V(TFT|TFT) and V(DCDC|TFT).
(ii) If w (P-S)/(R-S), then for any k, In this case, player 2 can increase his or her payoff by increasing the number of defection.
However, he or she cannot do better than TFT or ALL D. Thus, the point is that TFT can get higher payoff than ALL D or not. Therefore, From (i) and (ii), we obtain, for both cases,

w=(T-R)/(R-S)=(T-R)/(T-P) implies w=(P-S)/(R-S).
We can conclude that defection of any number of times on any move cannot change the payoff of player2 if the above condition is satisfied.
This is what we mentioned in (II) of the previous section.
(III) From (I) and (II), when confronted with TFT, any defection cannot increase its own payoff.
Thus, any strategy having more than one D in its sequence of moves cannot get higher payoff than TFT.
Now, we define D 0 '(k) as a strategy having more than one D in its sequence of moves except for the (k-1)th move through the (k+n+1)th move. And