Nationaløkonomisk Tidsskrift, Bind 130 (1992) Festskrift til Sven Danø og R Nørregaard Rasmussen (II)

Fictitious Play and the Interpretation of Mixed Strategy Equilibria

Institute of Economics, University of Copenhagen

Ebbe Hendon, Hans Jørgen Jacobsen, and Birgitte Sloth

Resumé

SUMMARY: We present a bounded rationality model of the thinking done by players before acting in a one shot game, the method of fictitious play. Mixed strategy equilibria be interpreted as the expectation of an outside observer concerning the players actions rather than as actual randomizations.

1. Introduction

Game theory deals with strategic situations, i.e., with situations where the actions of several players have consequences for the joint outcome. The applications to economic theory are evidently enormous. But game theory is marred by basic and conceptual problems. This paper is a contribution to the solution of one of these problems.

For simplicity we deal with the simple case of 2 X 2 strategic form games, i.e., games where two players each have two possible (pure) strategies. Such a game can be illustrated as the game Gx of Figure 1, where player 1 has the strategies U and D, and player 2 has the strategies L and R. The entries in the matrix are the outcomes in terms of utility combinations arising from different strategy combinations. If, for example, player 1 chooses U and player 2 chooses L then an outcome occurs in which player 1 has utility 2 and player 2 utility 1. The game is only played once, and the players must choose their strategies without communication, and without knowing the other player's strategy. Players are assumed to be individually rational: For any theory they hold on the other player, they will seek to maximize utility.

The problem of game theory is to predict or explain what will be played in strategic
situations; in particular a game theorist should be able to give some prediction for the
game G,. What will be played here?

The without competition most prominent solution concept in game theory is the concept of a Nash equilibrium. A Nash equilibrium is a strategy combination where each player cannot gain from deviating given the other players' strategies. In a Nash equilibrium each player plays a best reply against the other players' strategy. It is easy to see that G( has a unique Nash equilibrium, namely the strategy combination (U, L).



This paper reports some of the results of Hendon el al. (1990). Ebbe Hendon and Hans Jørgen Jacobsen were supported by the Danish Social Sciences Research Council.

Side 299

DIVL6097

Figure 1. Gx-A game with a unique Nash equilibrium in pure strategies.


DIVL6100

Figure 2. G2G2- A game without existence Nash equilibrium in pure strategies.

E.g., the strategy combination (D, R) is not a Nash equilibrium, since it is not best reply
for player 1 to play D if player 2 plays R.

Even though the outcome (U, L) may seem reasonable for Gx, in general there are serious conceptual problems with the concept of Nash equilibrium: Why, at all, should players coordinate on a Nash equilibrium strategy profile? There are no straightforward to this question and recently a lot of research has been directed to the foundations of Nash equilibrium. The analysis of fictitious play is one line in this research. analysis has, however, also provided one possible explanation of an older game theoretic puzzle: The interpretation of a mixed strategy Nash equilibrium.

In games like G2G2 there is no Nash equilibrium. This problem is usually solved by allowing mixed strategies, i.e., probability measures over the pure strategies of the players. Then one further assumes that the payoffs satisfy the expected utility hypothesis, the utility to a player receiving some lottery over outcomes, is the expected value the utilities of the outcomes. With this extension any finite form game has an equilibrium. For example, in G2G2 the strategy combination where player 1 plays U with probability \ and D with probability \, and player 2 plays L with probability § and R with probability | is a Nash equilibrium.

But mixed strategy Nash equilibria seem difficult to defend: In G2, (I, 1) is a best
11 JO
reply for player 2 given that player 1 plays ( , \), but so is L, R, and any other mixture
that player 2 can choose. Why should he choose exactly the only mixture, which makes
player 1 indifferent between his pure strategies? He has to do that for the players to end
up in a Nash equilibrium! So, what is a mixed strategy? Do we really predict that players
coins in order to make the strategy of the opposing player optimal?

The concept of mixed strategies is rather controversial, and several critical points
can be raised. These points stem from the naive interpretation of a mixed strategy: A
player actually randomizes according to the probabilities of the mixed strategy.

Side 300

The first objection is that the real situation modeled may not include a possibility to randomize. Indeed, if it does, then the set of pure strategies of the players should include probability distributions from the beginning. From this point of view G2G2 may indeed no Nash equilibrium.

Secondly, while a randomization may seem easy to perform in the example above by flipping a coin or throwing a dice, more complex probabilities (for instance involving non-rational numbers) can hardly be constructed without considerable effort. This could then decrease utility, and as such make it more attractive to choose a pure best reply.

Thirdly, as stated above, it appears strange that a player should choose a particular mixed strategy in order to guarantee equilibrium. This is, however, not a critique restricted mixed strategies, but applies whenever a player has several possible best replies equilibrium. So this critique should rightly be directed against the Nash equilibrium

Students of game theory are then easily led to suspect that the introduction of mixed
strategies is a mathematical trick, a mere technical device introduced to ensure existence
Nash equilibriuml.

To sum up, it is undoubtedly the case, that many people in applied as well as theoretical theory are skeptical towards the use of mixed strategies. The present paper provides one possible justification for the use of mixed strategies, highly inspired by Binmore(1988).

We aim at showing how a reasonable modeling of the preplay thought process performed players may indeed lead to the use of mixed strategy equilibrium as a prediction the play of the game. We think of the method of fictitious play, originally from Brown (1951), viewed as a bounded rationality model of preplay thinking.

Initially, before any reasoning has taken place, each player holds a theory of what will be played in the game. This theory is a probability distribution over the pure strategiesof player of the game, including himself, and thus has the structure of a combination of mixed strategies. It is assumed that this initial theory is common to all players, a serious but widely used assumption. It seems reasonable that each player would test his theory by asking: If this were the theory held by all players, what would they play? Individual rationality implies that they would all play a best reply, and this combination of best replies can then be compared to the theory. If there is a difference this is a motivation for changing the theory. It seems reasonable to change it in the directionof best reply. We will assume that the former theory is changed to a new, which is a simple average of the former theory and the combination of best replies to this theory. The procedure is repeated for the new theory and so on. In this way, the



1. This suspicion may be further nutured by the common folklore among some Danish economists, that the contributions of mathematical economists to economics at times lack intuition and relevance!

Side 301

theory is gradually adjusted by in each round changing it in the direction of a best reply to itself. As the number of rounds increase, the currently held theory will be based on more reasoning, which should imply that the weight put on it, when adjusting it towardsa reply, should be increased. The resulting sequence of theories is then examined.In it converges, the convergence point is our prediction of the players' behavior.A argument for this use is given in section 4.

While this model is very specific and primitive, forcing us to adopt a modest view on the general applicability of the results, it does however seem to have a core of reasonableness it as a model of preplay reasoning. In particular we have not presupposed Nash equilibrium in the construction.

2. Notation and Preliminaries

Consider games with two players, 1 and 2, where each player has two available pure
strategies, U and D for player 1, and L, R for player 2. The payoff functions ux and u2u2
assign a payoff to each player for each combination of pure strategies.

Let xE [o,l] denote the mixed strategy of 1, where he uses U with probability x and D with probability 1 -x, and let similarly yE [o,l] denote the mixed strategy of player 2 where she uses L with probability y. So, a combination of mixed strategies is a pair z - (x, y) E [o,l] X [o,l]. The set of mixed strategy combinations is denoted by Z. Assuming the preferences of each player / satisfy the expected utility hypothesis, we extend the domain of w, to Zby:


DIVL6112

Let Bx(y)Bx(y) denote the set of best replies of 1 against^, i.e., the set of mixed strategies
which maximize 1 's utility given the theory y on the action of 2:


DIVL6116

and define B2(x)B2(x) similarly. As an example consider the game G2. We can calculate Bx to the following: If y > | then the unique best reply of 1 is pure strategy U, corresponding jc = 1. For y<| , D is the best reply. For y=| both U and Das well as any mixed strategy are best replies. We can thus write down B{ as follows:


DIVL6120

Construct the vector best reply correspondence B, by B(x, y) := Bx(y)Bx(y) X B2(x).B2(x).

Side 302

A Nash equilibrium is a mixed strategy combination z*, such that each player uses a best reply given the mixed strategies of the other player, i.e., z*E B(z*). Interpreting mixed strategies as expectations, a Nash equilibrium is a consistent set of theories on the players.

3. Fictitious Play

For the definition of fictitious play we need a specific tie-breaking rule to take care of situations with multiple best replies: Either there is a unique best reply or there is a tie; both pure strategies as well as all convex combinations of these are best replies. A tie-breaking rule for player 1 is thus a function bx such that bx(y) EB\(y) for all y. An example of tie-breaking rule is b\ defined by:


DIVL6133

(1)

That is, in case of a tie, b\ assigns equal probability to the optimal pure strategies.

For all arguments given below the specific choice of tie-breaking rule is irrelevant.
Denote by b= (b{, b^J a vector of tie-breaking rules. So, this is a function b:Z-*Z, assigning
each theory a particular combination of best replies.

Fictitious play furthermore rests on a common, initial theory z° = (x°, y°). Players
use b(z°) to update z° by calculating a simple average of z° and b(z°);


DIVL6143

Now, z1 is the current theory. Players then use the best reply b(zx) to update zl,z1, but
are assumed to increase the weight on the current theory zl,z1, reflecting the fact that it is
based on one more step of reasoning than was z°, so


DIVL6147

In general zs is defined from b and z° by:


DIVL6151

(2)

or equivalently,


DIVL6157

(3)

Side 303

DIVL6195

Figure 3. Thefunction z >-^ bc(z), giving the dynamics ofG\.


DIVL6198

Figure 4. Thefunction z >-» bc(z), giving the dynamics ofG2-

This construction is called fictitious play, and the associated sequence of theories
(zs) the fictitious play sequence generated by z° and />.
Note from (2) that


DIVL6163

So, b(zsA) - zvl is the direction in which the theory changes in step s. Futhermore
we see that -^_s is an upper bound on the change of any one coordinate in step s, i.e., that
the changes in the theory become increasingly smaller2.

We now offer a geometric impression of fictitious play. Figure 3 illustrates Z; the set of mixed strategy combinations. Choosing x along the vertical andy along the horizontal makes the corners of Z correspond to the matrix of pure strategies: The upperright of Z is the combination (x, y) = (1,0), corresponding to the pure strategy combination (U, R), and so on. Figure 3 is constructed for the game G\. The arrows point from mixed strategy combinations z, to vector best replies bi(z) . For G, the best reply correspondences take the form


DIVL6169


2. Sincc S —» x as ,v —> =c, the decrcasing step size does not imply convcrgencc of thc scqucncc. (r=l ]+<r

Side 304

For the particular tie-breaking rule of (1) we then get bc: Z-^Z given by


DIVL6173

From Figure 3 it is obvious that from any initial theory, corresponding to an element
of the quadrant, the fictitious play sequence ultimately converges to the point (x, y) =
(1,1), corresponding to the unique Nash equilibrium (U, L) of G{.

For the game G2G2 the situation is rather more complicated. Figure 4 shows the dynamics. 5 shows the first 250 steps of the particular fictitious play sequence generated by the initial theory z° =(J 1) and the tie-breaking rule bc. This gives an idea how the sequence converges to (^, |), the unique Nash equilibrium of G2G2-

Side 305

For the games G\ and G2G2 we have argued that 1) fictitious play converges and 2) the convergence point is a Nash equilibrium. Results on fictitious play fall into these two categories, convergence results and characterization results. It can in fact be shown that for any 2x2 game fictitious play converges for all choices of initial theory and tiebreaking see Hendon et.al. (1990, 1991 a). By an example due to Shapley (1964), fictitious play does not always converge. We do however have

Theorem. For any finite n-person game: If a fictitious play sequence (zs) converges
to z*, then z* is a Nash equilibrium.

Proof3, z* is a Nash equilibrium if every player / uses a best reply. The (possibly)
mixed strategy z* of player / is best reply, if and only if all pure strategies s, used with
positive probability are best replies. So, consider a pure strategy s, with z* (sj > 0.

It must be the case that st was a best reply infinitely often during the fictitious play sequence (zs). Otherwise, there would only be a finite number of steps at which s, was used with positive probability, and then z* (sj = 0 from (2), contradicting z* (st)> 0. So, there is a subsequence (zs) of (zs), such that st is a best reply st EBj (zs) for all s'.

Consider the set of theories S(Sj) for which the pure strategy s, is a best reply. By continuity of the utility function uh S(s,) is closed. Since zs is in S(st) for all s', and zv converges to z*, we must have z* in S(st), i.e., s, is a best reply against z*. From the above then z* E Bj (z*). ?

This result gives foundation for the Nash equilibrium concept in games where fictitious
converges. In games where fictitious play does not converge, the concept of
rationalizability, Pearce (1984), is supported, see Hendon et.al. (1990).

Further results on fictitious play may be found in Robinson (1951), Thorlund-Petersen
Milgrom and Roberts (1991), Krishna (1991), Monderer and Shapley
(1991) and Hendon et.al. (1990, 1991 b).

4. The Interpretation of Convergence Points

Assume that each player does his preplay iterations mechanically. He cannot continue and each step takes some (small) amount of time. At some point the player has to act. At this point the player has updated for some (large) number of steps a and holds a theory ziT. By individual rationality the player will then play a best reply to z'T.

To be specific, assume that all stopping stages 0, 1, ...., 5-1 up to some maximal
number s of iterations are equally likely, each having probability Ms. Given these assumptions
expected action of player / will be:



3. Although we use the notation introduced for 2 X 2 games, the proof is valid for any finite game.

Side 306

DIVL6229

Figure 6. G3.


DIVL6213

(4)

1 s
From (3) then we get z? =—— z? +—— es, or equivalently:
' \+S ' I+s '


DIVL6219

Increasing the maximal number of iterations, we can investigate expected behavior of/ by studying the sequence (es). From the calculations above it is clear that xs converges (x*) if and only if es converges to z*. Then a convergence point z* of (zs) can be interpreted as the expected action of player / (the best prediction of an outside observer not knowing the stage at which i is asked to play) as the maximal number of iterations goes to infinity.

This yields a nice interpretation of mixed strategies. lfz* = (x*,y*) has some players using mixed strategies, we need not assume that actual randomizations are carried out. Rather the mixture signifies the following: As shown above, x* is the proportion of steps in which the strategy U of player 1 is the best reply (assuming that the set of steps in which ties occur is of measure zero), and if the reasoning of player / stops at some random step, with all steps being equally likely, x* is the probability that U is chosen.

Note that when making his move, player i will typically play a pure strategy. Figure 5, which show convergence to the mixed strategy equilibrium of G2, illustrates this point. There is no randomization at the individual level. It is only due to the lack of knowledge as to which step the opposing players have reached, that the probability distribution pure strategies arises. According to this interpretation, a mixed strategy equilibrium is a statistical prediction of the outcome of the game.

Side 307

5. Concluding Remarks

Fudenberg and Kreps (1991) raises a critique against the use of fictitious play for
interpreting mixed strategy equilibria. Consider the game G3G3 of figure 6.

For a symmetric initial theory like (x°, y°) = (|, ~), the fictitious play sequence will evolve in the following way: For a number of steps the best replies of 1 and 2 will be D and L respectively. Through the updating xs is decreased andy is increased until at some stage both players simultaneously switch to U and R respectively. The theories will ultimately converge to (x*,y*) = (f,~), a mixed strategy Nash equilibrium of G3. But the sequence will everywhere exhibit the best replies (D, L) or (U, R). So, the strategies for updating will be coordinated in the "bad" outcomes.

While we find the critique valid for the use of fictitious play as a model of learning in a repeated game, where the updating strategies are actually played in each round, we find it less convincing for the interpretation of fictitious play as a preplay thought process, each player thinks independently of the others. With uncorrelated stopping stages, the prediction will indeed be the product of the marginal distributions.

To sum up we have presented a bounded rationality model of the thought process of players such that mixed strategy equilibria arise naturally as the expectation concerning the players' actions rather than expressing actual random actions. In particular the mixed of player / is not only the expectation of other players, but also the prediction fs action made by an outside observer. In this interpretation both pure and mixed equilibria are predictions of the outcome of the game, for the mixed strategy given as a non-trivial probability distribution over pure strategy combinations.

References

Binmore, K. 1988. Modeling Rational Players,
11. Economics and Philosophy 4,
9-55.

Brown, G. W. 1951. Iterative Solutions of Games by Fictitious Play. In Activity Analysis Production and Allocation (T. C. Koopmanns, ed.), pp. 374-376. New York.

Fudenberg, D. and D. Kreps. 1991. A Theory of Learning, Experimentation, and Equilibrium Games. Department of Economics, and Graduate School of Business,

Hendon, E., H. J. Jacobsen, M. T. Nielsen, and B. Sloth. 1990. A Learning Process for Games. Discussion Paper 90-20, Institute of Economics, University of Copenhagen.

Hendon, E., H. J. Jacobsen, and B. Sloth.
1991 a. Notes on Convergence of the
Learning Process in 2 x 2 Games. Mimeo,

Institute of Economics, University of Copenhagen.

Hendon, E., H. J. Jacobsen, and B. Sloth. 19916. Fictitious Play in Extensive Form Games and Sequential Equilibrium. Mimeo, of Economics, University of Copenhagen.

Krishna, V 1991. Learning in Games with
Strategic Complementarities. Harvard Business

Milgrom, P. and J. Roberts. Adaptive and Sophisticated in Normal Form Games. Games and Economic Behavior 3, 82-100.

Miyasawa, K. 1961. On the Convergence of the Learning Process in a 2 x 2 Non-zerosum Game. Research Memorandum Princeton University.

Monderer, D. and L. S. Shapley. 1991. Poten-

tial Games. Mimeo, The Technicon, Haifa
and University of California, Los Angeles.

Pearce, D. G. 1984. Rationalizable Strategic
Behavior and the Problem of Perfection.
Econometrica 52, 1029-1051.

Robinson, J. 1951. An Iterative Method of
Solving a Game. Annals of Mathematics
54,296-301.

Shapley, L. S. 1964. Some Topics in Two-person In Advances in Game Theory. M. Dresner, L. S. Shapley and A. W. Tucker, eds. pp. 1-28. Princeton.

Thorlund-Petersen, L. 1990. Iterative Computation
Cournot equilibrium. Games and
Economic Behavior 2,61-75.