Nationaløkonomisk Tidsskrift, Bind 130 (1992) Festskrift til Sven Danø og R Nørregaard Rasmussen (II)Fictitious Play and the Interpretation of Mixed Strategy EquilibriaInstitute of Economics, University of Copenhagen Ebbe Hendon, Hans Jørgen Jacobsen, and Birgitte Sloth ResuméSUMMARY: We present a bounded rationality model of the thinking done by players before acting in a one shot game, the method of fictitious play. Mixed strategy equilibria be interpreted as the expectation of an outside observer concerning the players actions rather than as actual randomizations. 1. IntroductionGame theory deals with strategic situations, i.e., with situations where the actions of several players have consequences for the joint outcome. The applications to economic theory are evidently enormous. But game theory is marred by basic and conceptual problems. This paper is a contribution to the solution of one of these problems. For simplicity we deal with the simple case of 2 X 2 strategic form games, i.e., games where two players each have two possible (pure) strategies. Such a game can be illustrated as the game Gx of Figure 1, where player 1 has the strategies U and D, and player 2 has the strategies L and R. The entries in the matrix are the outcomes in terms of utility combinations arising from different strategy combinations. If, for example, player 1 chooses U and player 2 chooses L then an outcome occurs in which player 1 has utility 2 and player 2 utility 1. The game is only played once, and the players must choose their strategies without communication, and without knowing the other player's strategy. Players are assumed to be individually rational: For any theory they hold on the other player, they will seek to maximize utility. The problem of
game theory is to predict or explain what will be played
in strategic The without competition most prominent solution concept in game theory is the concept of a Nash equilibrium. A Nash equilibrium is a strategy combination where each player cannot gain from deviating given the other players' strategies. In a Nash equilibrium each player plays a best reply against the other players' strategy. It is easy to see that G( has a unique Nash equilibrium, namely the strategy combination (U, L). This paper reports some of the results of Hendon el al. (1990). Ebbe Hendon and Hans Jørgen Jacobsen were supported by the Danish Social Sciences Research Council. Side 299
E.g., the
strategy combination (D, R) is not a Nash equilibrium,
since it is not best reply Even though the outcome (U, L) may seem reasonable for Gx, in general there are serious conceptual problems with the concept of Nash equilibrium: Why, at all, should players coordinate on a Nash equilibrium strategy profile? There are no straightforward to this question and recently a lot of research has been directed to the foundations of Nash equilibrium. The analysis of fictitious play is one line in this research. analysis has, however, also provided one possible explanation of an older game theoretic puzzle: The interpretation of a mixed strategy Nash equilibrium. In games like G2G2 there is no Nash equilibrium. This problem is usually solved by allowing mixed strategies, i.e., probability measures over the pure strategies of the players. Then one further assumes that the payoffs satisfy the expected utility hypothesis, the utility to a player receiving some lottery over outcomes, is the expected value the utilities of the outcomes. With this extension any finite form game has an equilibrium. For example, in G2G2 the strategy combination where player 1 plays U with probability \ and D with probability \, and player 2 plays L with probability § and R with probability | is a Nash equilibrium. But mixed
strategy Nash equilibria seem difficult to defend: In
G2, (I, 1) is a best The concept of
mixed strategies is rather controversial, and several
critical points Side 300
The first objection is that the real situation modeled may not include a possibility to randomize. Indeed, if it does, then the set of pure strategies of the players should include probability distributions from the beginning. From this point of view G2G2 may indeed no Nash equilibrium. Secondly, while a randomization may seem easy to perform in the example above by flipping a coin or throwing a dice, more complex probabilities (for instance involving non-rational numbers) can hardly be constructed without considerable effort. This could then decrease utility, and as such make it more attractive to choose a pure best reply. Thirdly, as stated above, it appears strange that a player should choose a particular mixed strategy in order to guarantee equilibrium. This is, however, not a critique restricted mixed strategies, but applies whenever a player has several possible best replies equilibrium. So this critique should rightly be directed against the Nash equilibrium Students of game
theory are then easily led to suspect that the
introduction of mixed To sum up, it is undoubtedly the case, that many people in applied as well as theoretical theory are skeptical towards the use of mixed strategies. The present paper provides one possible justification for the use of mixed strategies, highly inspired by Binmore(1988). We aim at showing how a reasonable modeling of the preplay thought process performed players may indeed lead to the use of mixed strategy equilibrium as a prediction the play of the game. We think of the method of fictitious play, originally from Brown (1951), viewed as a bounded rationality model of preplay thinking. Initially, before any reasoning has taken place, each player holds a theory of what will be played in the game. This theory is a probability distribution over the pure strategiesof player of the game, including himself, and thus has the structure of a combination of mixed strategies. It is assumed that this initial theory is common to all players, a serious but widely used assumption. It seems reasonable that each player would test his theory by asking: If this were the theory held by all players, what would they play? Individual rationality implies that they would all play a best reply, and this combination of best replies can then be compared to the theory. If there is a difference this is a motivation for changing the theory. It seems reasonable to change it in the directionof best reply. We will assume that the former theory is changed to a new, which is a simple average of the former theory and the combination of best replies to this theory. The procedure is repeated for the new theory and so on. In this way, the 1. This suspicion may be further nutured by the common folklore among some Danish economists, that the contributions of mathematical economists to economics at times lack intuition and relevance! Side 301
theory is gradually adjusted by in each round changing it in the direction of a best reply to itself. As the number of rounds increase, the currently held theory will be based on more reasoning, which should imply that the weight put on it, when adjusting it towardsa reply, should be increased. The resulting sequence of theories is then examined.In it converges, the convergence point is our prediction of the players' behavior.A argument for this use is given in section 4. While this model is very specific and primitive, forcing us to adopt a modest view on the general applicability of the results, it does however seem to have a core of reasonableness it as a model of preplay reasoning. In particular we have not presupposed Nash equilibrium in the construction. 2. Notation and PreliminariesConsider games
with two players, 1 and 2, where each player has two
available pure Let xE [o,l] denote the mixed strategy of 1, where he uses U with probability x and D with probability 1 -x, and let similarly yE [o,l] denote the mixed strategy of player 2 where she uses L with probability y. So, a combination of mixed strategies is a pair z - (x, y) E [o,l] X [o,l]. The set of mixed strategy combinations is denoted by Z. Assuming the preferences of each player / satisfy the expected utility hypothesis, we extend the domain of w, to Zby: Let Bx(y)Bx(y)
denote the set of best replies of 1 against^, i.e., the
set of mixed strategies and define B2(x)B2(x) similarly. As an example consider the game G2. We can calculate Bx to the following: If y > | then the unique best reply of 1 is pure strategy U, corresponding jc = 1. For y<| , D is the best reply. For y=| both U and Das well as any mixed strategy are best replies. We can thus write down B{ as follows: Construct the
vector best reply correspondence B, by B(x, y) :=
Bx(y)Bx(y) X B2(x).B2(x). Side 302
A Nash equilibrium is a mixed strategy combination z*, such that each player uses a best reply given the mixed strategies of the other player, i.e., z*E B(z*). Interpreting mixed strategies as expectations, a Nash equilibrium is a consistent set of theories on the players. 3. Fictitious PlayFor the definition of fictitious play we need a specific tie-breaking rule to take care of situations with multiple best replies: Either there is a unique best reply or there is a tie; both pure strategies as well as all convex combinations of these are best replies. A tie-breaking rule for player 1 is thus a function bx such that bx(y) EB\(y) for all y. An example of tie-breaking rule is b\ defined by: (1) That is, in case of a
tie, b\ assigns equal probability to the optimal pure
strategies. For all arguments
given below the specific choice of tie-breaking rule is
irrelevant. Fictitious play
furthermore rests on a common, initial theory z° = (x°,
y°). Players Now, z1 is the
current theory. Players then use the best reply b(zx) to
update zl,z1, but In general zs is
defined from b and z° by: (2) or equivalently,
(3) Side 303
This construction
is called fictitious play, and the associated sequence
of theories So, b(zsA) - zvl
is the direction in which the theory changes in step s.
Futhermore We now offer a geometric impression of fictitious play. Figure 3 illustrates Z; the set of mixed strategy combinations. Choosing x along the vertical andy along the horizontal makes the corners of Z correspond to the matrix of pure strategies: The upperright of Z is the combination (x, y) = (1,0), corresponding to the pure strategy combination (U, R), and so on. Figure 3 is constructed for the game G\. The arrows point from mixed strategy combinations z, to vector best replies bi(z) . For G, the best reply correspondences take the form 2. Sincc S —» x as ,v —> =c, the decrcasing step size does not imply convcrgencc of thc scqucncc. (r=l ]+<r Side 304
For the
particular tie-breaking rule of (1) we then get bc: Z-^Z
given by From Figure 3 it
is obvious that from any initial theory, corresponding
to an element For the game G2G2 the situation is rather more complicated. Figure 4 shows the dynamics. 5 shows the first 250 steps of the particular fictitious play sequence generated by the initial theory z° =(J 1) and the tie-breaking rule bc. This gives an idea how the sequence converges to (^, |), the unique Nash equilibrium of G2G2- Side 305
For the games G\ and G2G2 we have argued that 1) fictitious play converges and 2) the convergence point is a Nash equilibrium. Results on fictitious play fall into these two categories, convergence results and characterization results. It can in fact be shown that for any 2x2 game fictitious play converges for all choices of initial theory and tiebreaking see Hendon et.al. (1990, 1991 a). By an example due to Shapley (1964), fictitious play does not always converge. We do however have Theorem. For any
finite n-person game: If a fictitious play sequence (zs)
converges Proof3, z* is a
Nash equilibrium if every player / uses a best reply.
The (possibly) It must be the case that st was a best reply infinitely often during the fictitious play sequence (zs). Otherwise, there would only be a finite number of steps at which s, was used with positive probability, and then z* (sj = 0 from (2), contradicting z* (st)> 0. So, there is a subsequence (zs) of (zs), such that st is a best reply st EBj (zs) for all s'. Consider the set of theories S(Sj) for which the pure strategy s, is a best reply. By continuity of the utility function uh S(s,) is closed. Since zs is in S(st) for all s', and zv converges to z*, we must have z* in S(st), i.e., s, is a best reply against z*. From the above then z* E Bj (z*). ? This result gives
foundation for the Nash equilibrium concept in games
where fictitious Further results
on fictitious play may be found in Robinson (1951),
Thorlund-Petersen 4. The Interpretation of Convergence PointsAssume that each player does his preplay iterations mechanically. He cannot continue and each step takes some (small) amount of time. At some point the player has to act. At this point the player has updated for some (large) number of steps a and holds a theory ziT. By individual rationality the player will then play a best reply to z'T. To be specific,
assume that all stopping stages 0, 1, ...., 5-1 up to
some maximal 3. Although we use the notation introduced for 2 X 2 games, the proof is valid for any finite game. Side 306
(4) 1 s Increasing the maximal number of iterations, we can investigate expected behavior of/ by studying the sequence (es). From the calculations above it is clear that xs converges (x*) if and only if es converges to z*. Then a convergence point z* of (zs) can be interpreted as the expected action of player / (the best prediction of an outside observer not knowing the stage at which i is asked to play) as the maximal number of iterations goes to infinity. This yields a nice interpretation of mixed strategies. lfz* = (x*,y*) has some players using mixed strategies, we need not assume that actual randomizations are carried out. Rather the mixture signifies the following: As shown above, x* is the proportion of steps in which the strategy U of player 1 is the best reply (assuming that the set of steps in which ties occur is of measure zero), and if the reasoning of player / stops at some random step, with all steps being equally likely, x* is the probability that U is chosen. Note that when making his move, player i will typically play a pure strategy. Figure 5, which show convergence to the mixed strategy equilibrium of G2, illustrates this point. There is no randomization at the individual level. It is only due to the lack of knowledge as to which step the opposing players have reached, that the probability distribution pure strategies arises. According to this interpretation, a mixed strategy equilibrium is a statistical prediction of the outcome of the game. Side 307
5. Concluding RemarksFudenberg and
Kreps (1991) raises a critique against the use of
fictitious play for For a symmetric initial theory like (x°, y°) = (|, ~), the fictitious play sequence will evolve in the following way: For a number of steps the best replies of 1 and 2 will be D and L respectively. Through the updating xs is decreased andy is increased until at some stage both players simultaneously switch to U and R respectively. The theories will ultimately converge to (x*,y*) = (f,~), a mixed strategy Nash equilibrium of G3. But the sequence will everywhere exhibit the best replies (D, L) or (U, R). So, the strategies for updating will be coordinated in the "bad" outcomes. While we find the critique valid for the use of fictitious play as a model of learning in a repeated game, where the updating strategies are actually played in each round, we find it less convincing for the interpretation of fictitious play as a preplay thought process, each player thinks independently of the others. With uncorrelated stopping stages, the prediction will indeed be the product of the marginal distributions. To sum up we have presented a bounded rationality model of the thought process of players such that mixed strategy equilibria arise naturally as the expectation concerning the players' actions rather than expressing actual random actions. In particular the mixed of player / is not only the expectation of other players, but also the prediction fs action made by an outside observer. In this interpretation both pure and mixed equilibria are predictions of the outcome of the game, for the mixed strategy given as a non-trivial probability distribution over pure strategy combinations. ReferencesBinmore, K. 1988.
Modeling Rational Players, Brown, G. W. 1951. Iterative Solutions of Games by Fictitious Play. In Activity Analysis Production and Allocation (T. C. Koopmanns, ed.), pp. 374-376. New York. Fudenberg, D. and D. Kreps. 1991. A Theory of Learning, Experimentation, and Equilibrium Games. Department of Economics, and Graduate School of Business, Hendon, E., H. J. Jacobsen, M. T. Nielsen, and B. Sloth. 1990. A Learning Process for Games. Discussion Paper 90-20, Institute of Economics, University of Copenhagen. Hendon, E., H. J.
Jacobsen, and B. Sloth. Institute of
Economics, University of Copenhagen. Hendon, E., H. J. Jacobsen, and B. Sloth. 19916. Fictitious Play in Extensive Form Games and Sequential Equilibrium. Mimeo, of Economics, University of Copenhagen. Krishna, V 1991.
Learning in Games with Milgrom, P. and J. Roberts. Adaptive and Sophisticated in Normal Form Games. Games and Economic Behavior 3, 82-100. Miyasawa, K. 1961. On the Convergence of the Learning Process in a 2 x 2 Non-zerosum Game. Research Memorandum Princeton University. Monderer, D. and
L. S. Shapley. 1991. Poten- tial Games.
Mimeo, The Technicon, Haifa Pearce, D. G.
1984. Rationalizable Strategic Robinson, J.
1951. An Iterative Method of Shapley, L. S. 1964. Some Topics in Two-person In Advances in Game Theory. M. Dresner, L. S. Shapley and A. W. Tucker, eds. pp. 1-28. Princeton. Thorlund-Petersen, L. 1990.
Iterative Computation |