Nationaløkonomisk Tidsskrift, Bind 130 (1992) Festskrift til Sven Danø og R Nørregaard Rasmussen (II)An Introduction to Decision Making When Uncertainty Is Not Just RiskInstitute of Economics, University of Copenhagen Ebbe Hendon, Hans Jørgen Jacobsen, Birgitte Sloth, and Torben Tranæs ResuméSUMMARY: An uncertain and not just risky situation may be modeled using so-called belief functions assigning lower probabilities to events, that is, to sets of possible outcomes. In this paper we give an introduction to this concept and a first contribution to modeling decision making when there is uncertainty. We extend the von Neumann- Morgenstern expected utility theory to belief functions. 1. IntroductionAn economic decision is a choice among a set of possible alternatives. In some cases alternatives may be equal to the final outcomes, but very often the final outcome depends upon some state of nature which may be unknown when the decision made. For example, a decision maker considers which share to buy in the stock market. The final outcome is what is earned on the transaction (in e.g. DKK), but this may depend upon future oil prices, on who will be the future manager of the firm, and other matters not known when the share is to be bought. Each decision can thus be viewed as a mapping assigning to each state of nature a particular outcome. In the von Neumann-Morgenstern (1947) expected utility theory, the decision maker holds an exogenous probability distribution over the states of nature, and thereby, for each possible decision, a probability distribution over the final outcomes (a lottery). It is assumed that the individual has a preference relation ordering all lotteries, and that this preference relation fulfills some basic and appealing postulates - axioms. Then it is shown that the preference relation can be represented by a utility function fulfilling the expected utility hypothesis, i.e., the utility of any lottery is the expectation of the utilities of the final outcomes with respect to the probabilities. In the von
Neumann-Morgenstern theory the probabilities over states
are exogenou This paper reports part of the material in Hendon, Jacobsen, Sloth, and Tranæs (1991). We are very grateful David Schmeidler for help and encouragement. Financial support from the Danish Social Sciences Research Council and Economic Department, Tel Aviv University is gratefully acknowledged. 1 .This is what distinguishes it from the contributions of Savage (1954) and Anscombe-Aumann (1963), which arc concerned with deriving subjective probabilities over states from preferences over decisions. Side 277
believes about the plausibility of different states. Sometimes he is naturally led to a probability distribution over states, e.g. the roulette, but in other cases the information is not sufficiently rich for this. For instance, let there be three possible future managers r, s and t of the firm. No evidence speaks for any particular of them. It is known that r and s, but not t, graduated from the Harvard Business School. The majority of shareholdershave in candidates from Harvard, etc. In this and many other situationsof interest, it seems highly unlikely that the decision maker, from the present evidence, should formulate his beliefs as precise as a probability distribution over states. To phrase it in classic terms: Uncertainty may be of a more fundamental nature than risk, that is, than what is captured by a probability distribution, as advocatedby and Keynes. A prominent example is the so-called Ellsberg paradox, Ellsberg (1961). The objective, evidence is this: There is an urn containing 150 balls; 50 of these are red, and 100 are either black or green; there is no further information. Individuals are first asked to choose between two bets, A and B, governed by a random draw of one ball. A: DKK 100 are won if the ball is red. B: DKK 100 are won if the ball is black. They are then given two other bets. C: DKK 100 if the ball is red or green. D: DKK 100 if the ball is black or green. A non-negligible proportion of decision makers strictly prefers A to B, and D to C. Assuming that bet / is strictly preferred to bet // if and only if bet / gives DKK 100 with strictly higher probability than bet //, these preferences are inconsistent any additive probability over colors. Generally, it seems relevant to consider decision environments, where an individual views each possible decision as giving rise to a mapping from states to outcomes but where the individual's assessment of which state will occur is uncertain in a fundamental that cannot be captured by a probability distribution over the states, and where the consequence of each possible decision is therefore something more fundamentally uncertain than a probability distribution over outcomes. As the objects capturing this more basic uncertainty, we use a generalization of probability distributions namely belief Shafer (1976), or lower probability functions, Dempster (1967). Belief functions are formally introduced and interpreted in Section 2 below. The primitives of the theory presented here are the set of belief functions over a finiteset outcomes and a preference relation on this set. The task will be to derive a utility representation of this preference relation from axioms on it, thereby obtaining some first insight in the way decision makers may deal with uncertainty. Our theory is von Neumann-Morgenstern-like in the sense that the belief functions over states (outcomes) are exogenous. There is also a literature concerned with deriving subjective, non-additive probabilities over states, Schmeidler (1989), Gilboa (1987) and Wakker Side 278
(1986)2.
However, the contribution closest to the present one is
the work of Jaffray The axioms we use here are as standard as possible in order not to confuse things. In fact, we are going to use the von Neumann-Morgenstern axioms and then add one which is supposed to be weak. The present paper is thus a first step in representing preferences over belief functions. 2. Belief functions and the problem consideredConsider a finite
set of outcomes; X= (x{?.., xn), and denote by xme set
of all A probability
measure it assigns to each event E a number 7r(E), the
probability of (1) (2) Property (2) is
called additivity. The set of probability measures onX
is denoted VA. A belief function v: x —> [0, I], is a generalization of a probability measure obtained by weakening (2). The interpretation of a belief function is that it assigns to each event E, a lower bound v(E), on the likelihood of E, see also Shafer (1976). The weight of the evidence in support of E, is v(E), while the plausibility of E is \~v(E), where E is the complement of E. So, a belief function embodies in this interpretation both a lower and a upper bound on the likelihood of each event, and in this sense it may contain uncertainty to risk: It assigns to each event not just a single number, the probability that event, but an interval, the range of possible probabilities of the event. As an example consider again the share/manager example of the introduction and assume that our decision maker, to the three possible states r, s, and t, associates the three rates of return xu x2, and jc3 respectively. The fact that he has little faith in any particular of the potential future managers, but high faith in it being either r or s could, for instance, be expressed by the following belief function over outcomes: v(0) = 0, v({xj) = v({x2}) = v({xj) = 0.1, v({Xl, x3}) = v({x2, x3}) = 0.2, v({xu x2j) = 0.8 v({xux2,xj) - 1. Which
properties should be attributed to a belief function v?
One suggestion is that 2. Alternative approaches are Gilboa and Schmeidler (1989) and Vind (1991). Side 279
will soon make precise, is that going to larger sets can only increase certainty. In the same spirit it could then be suggested that for any three events E, F and G, we should have v(EUFUG) > v(E) + v(F) + v(G) - v(EHF) - v(EDG) - v(FHG) + v(EnFfIG), etc. In Schafer (1976), the following restriction, which is a generalizationof above for two and three events to k events, is introduced as the defining propertyof belief function: (3) This property is called £-monotonicity. To understand the content of Æ-monotonicity note that (3) with equality is the usual inclusion-exclusion rule for probability measures follows from (2) by induction. As already noted, a similar implication does not hold for inequalities. The condition of &-monotonicity must therefore be imposed in order to have an analog to the usual inclusion-exclusion rule. We require Æ-monotonicity for all k, i.e., v: \—> [o,l] is a belief function if it satifies and (3) for any k >2. Denote by VBVB the set of all belief functions. Since probability are for all k, we have VAVA C VB. One can simply think of a belief function vas the vector (v(E))Eex in IR™, m:= #X. For any two belief functions ]t v2v2 E VA, and aE [o,l], the convex combination v= avx + (\-ol)v2 is defined v(E) = av{(E) + (\-a)v2(E), for all EE\. It is easy to verify that VAVA and VBVB are convex sets. We have not yet justified the assumption of Æ-monotonicity for all A: > 2. However, one very natural way to think of an uncertain environment is as given by a so-called massfunction, as in Shafer (1976). A function m. x—» [o,l], is a mass function if m(0) =0, and V*Eex m (E) =1. The interpretation is that for any event E, m(E) is the weight of evidence in support of E which is additional to the weight already assigned to the proper subsets of E. The fact that m has non-negative values captures the idea that going to larger events can only increase certainty. Think of each possible decision as giving rise to a particular mass function m. The belief in an event F is naturally defined v(F): = V,E.ECF m(E). Shafer (1976) shows that if v is defined from a mass function in this way, then v is a belief function, and conversely, any belief function is given by a uniquely determined mass function. Returning to the share/manager problem we see that v can be derived from the mass function m, where m({xj) - m({x2}) = m({x3}) = m({xhx2,x3}) =0.1, m ({x\,x2}) = 0.6, and m({x\,x?,}) = m({x2,Xi,}) = 0. So most of the evidence weighs in favor of {xux2} reflecting that the "Harvard argument" points to this set. A particular
class of belief functions is the unit belief functions.
For any E EX, Side 280
outcome will be
an element in E for sure, and nothing else. Then vE is
indeed a belief Let mv be the
unique mass function defining u Then for any F, v(F) =
*LELE.ECFmv (E) (4) The primitive of our theory is (VB, >), the set of belief functions over X with a preference > on it, where > should be read "is at least as good as".3 The problem is, from certain axioms on >, to find a representation; a function U: VVB -+ IR, such that for all v,wE VB:v >w <=> U(v) U(w), where Uhas an intuitive interpretation. From >, define > by: v > w if v > w but not w > v, and define ~ by: v ~ w if v>w and w>v. Given fl, >) there will also be given preferences over the probability in VA, in particular over the lotteries assigning probability 1 to one outcome, and zero to all other outcomes. We write x{ >x2 if the lottery giving xx for sure is at least as good as the lottery giving x2x2 for sure. Without loss of generality we assume that jc, > x2x2 > .... > xn. A simple version
of the so-called Mixture Set Theorem, Herstein and
Milnor Al. (Weak order).
>on Mis complete and transitive. A2. (Independence).
For all mx, m2, w3w3 EM, and a E ]0,l]: If mx > m2,
then aw, + (\-a)m3 > am2 + (\-a)m3.
A3. (Continuity). For
all m{, m2, m3m3 EM, such that mx > m2m2 > m3m3
there are a, j3e[o,l], such
that: am,, +(\-a)m3 > m2m2 > (3mi+(\-/3)mi.
Mixture Set
Theorem, MST. The following two statements are
equivalent: (ii) There is an
affine function U:M^> IR4, such that U represents
>. Further, ifW is another
affine representation of>, then U'= aU+b, for some a,
b E IR, a > 0. Applying the
MST to (VA, >) yields the von Neumann-Morgenstern
theory of expectedutility: 3. Let a decision be a function /: S —> X, from a finite set of states to X. Further, let uncertainty with respect to states be described by the belief function 17: .(/ —> [o,l], where .7 is the set of subsets of S. Derive x—> [o,l] in the natural way: v(E) = r\(f (E)) for all EE\- Then vis a belief function on X. This justifies that we start right off with belief functions on X. 4. The function U is affine if U(ani\ + (\-a)m-> ) = aU(m\ )+(l a) U(m~, ). for all ni\ . m-, E M. a E [o.l]. Side 281
= i!!=] TTjUfej), where e{ is the /'th unit vector, that is the lottery giving xt for sure. Defining u(xt) := Uie^, we have: U(tt) = HHxx e x u(x)tt(x). The utility assigned to a lottery ir, is the expected value, with respect to 77, of the utilities assigned to the sure outcomes. The core5, C(v),
of a belief function vis the set of probability measures
which do (5) From Shapley
(1971), for any 2-monotone v, C(v) is non-empty. So, any
belief function A function/? from
{1, ... n} onto X, gives a particular sequence of the
outcomes; p (i) (6) To illustrate consider the simple permutation/?'^ ~ xi- Then tt*,v is the probability measure constructed from v by first giving to jc,, the best outcome, the least probability that can be given according to v. The second best outcome x2x2 is then given the least probability it can be given according to v and what has already been given to xh etc. Shapley (1971)
shows that for all 2-monotone v, the set of all vertices
(corners) of An example will clarify these concepts. Consider again the belief function of the share/manager example. Its core is illustrated in Figure 1. The triangle represents the set of all probability distributions over fjc,, x2, x3}, with ir(xx) measured along the TTfXi^-axis and so forth. From v^x-x}) = 0.1, we obtain from (5) the restriction tt(x3) > 0.1. From v({xx,x2}) = 0.8, tt(xx) + tt(x2)> 0.8, and hence tt(x3)< 0.2. Continuing like this the core of v appears as the shaded area. The points tt* v are easily computed. For instance, for/? = (x2, jc3, x{), it follows from (6), that tt* v(x2) = 0.1, then tt*pJxi) = 0.2 - 0.1 = 0.1, and finally rr^jxj = 1 - 0.2 = 0.8, so tt*1v = (0.8,0.1,0.1), which is one of the vertices of the core. For this particular v, there are different giving rise to the same it* v. With three outcomes, the core may have up to six vertices. 5. The core is a concept from cooperative game theory. Those familiar with this branch will realize that a belief function v: x —> [o>l]> has exactly the structure of a characteristic function of a game with side payment, X would then be the set of players, and x me set of coalitions. Side 282
3. Representation theorems: The nature of (VB, >)The set of belief functions is convex, so assuming Al-A3 for (VB, >) would, directly from the MST, imply the existence of an affine representing function U: VBVB —> 1R (and vice versa). Using (4) and affinity, Ucan be written: U(v) =U( Zf^vø/ mv(E) ve) = Y.egx\io!x \i0! mv(E)U(vE). This observation is parallel to the von Neumann-Morgenstern representation for probability measures, where now U(vE) is the counterpart of u(x). In so far as one has good intuition for U(vE) - like one has in the von Neumann-Morgenstern for u(x) - we have now an interesting representation. However, we do not think that the utility of ending in set E for sure but knowing nothing else, is as interpretable as the utility of getting outcome x for sure. Further, assuming Al-A3 for (VB, >) implies assuming them for (VA, >), so preferences over lotteries can still be represented using a von Neumann-Morgenstern utility on X. We would like to have the utility function Uon VBVB expressed in terms of u. To obtain this we must add an axiome 6 6. Jaffray (1989) also builds on the above observation. It is when it conies to "the additional axiom" that the present paper differs from that of Jaffray. Side 283
A4. (Non-extreme
attitude towards uncertainty). For any E E ;f\{o}( there
are We consider A4 to be very weak. First note that A4 is an axiom only on the unit belief A unit belief function vE, contains the information that the outcome will be in E for sure and nothing else. The set of probability measures giving something in E for sure is {it EVa I tt(E) =£ xeE tt(x) = I}, which is exactly C(vE). The elements this set are ranked by >. The content of A4 is just that vE itself is no better than the very best element and no worse than the very worst element of this set. In particular, since A4 is made only on unit belief functions, tte and7r£ may be the worst and the best pure outcome of E that is, we could have tte, tte EE, and^ >x >tte for all x E E. Theorem 1 below
is a rather easy consequence of the MST and A4. For
formal Theorem 1. Let
(VB, >) be given. The following two statements are
equivalent: Theorem 1 is a qualitative statement on the nature of (VB,>). Given AI-A4, preferences the complicated objects, belief functions, are as (/each belief function is identified a lottery in its core, from which the expected von Neumann-Morgenstern utility in the usual sense is computed, and the so defined numbers rank all the belief functions in preference. Conversely, if the preference has such a representation, A1 -A4 are fulfilled. However, Theorem 1 says little about which element of C(v), v is identified We only know that the function tt from VBVB to VAVA is affine, and thereby continuous, v. It is, of course, warranted to say more about the variation of ttv with u Since Theorem 1 is an if-and-only-if-theorem, we know that this will require stronger axioms. We introduce now an axiom of concistency which is stronger than A4. Consider the belief function vx which contains absolutely no information: It says that the outcome will be in X and nothing else. Assume that Al-A2 are fulfilled. Then there is some subset FIX C VA, such that vx ~tt for all tt E FIx. A natural consistency requirement will now be that there is some ttx E I1x, with ttx(x) >0 for all x, such that for each subset E, vE - tte E Va, where irE is derived from irA- by Bayes' rule, i.e., Side 284
tte (x) = i:x(x)/'nx (E) for x E E.7 This is a requirement of internal consistency of the decision maker's preference over unit belief functions. As the appropriate consistency concept we use Bayes' rule which is possible since the requirement is only concerned with unit belief functions. Note that Bayes' rule is not used for updating the uncertain objects, the belief functions; ist is only used as a way of expressing consistency of preferences. A5.
(Consistency). There is ttx E VAVA with iTx(x)> 0 for
all x EX, such that vvx~~7rx, It is obvious that A5 is a special case of A4: Just use the irE of A5 as both jrE andir£ in A4. To prove Theorem 2 below, basically what is needed is Theorem I above and a result from cooperative game theory on Shapley values (see e.g. Kalai and Samet (1988)). 7. The assumption is restrictive, and is made for simplicity. Axiom A5 and Theorem 2 can be generalised to allow for ttx(x) = 0 for some x by using the notion of a weight system, see Hendon, Jacobsen, Sloth, andTranaes(l99l). Side 285
Theorem 2. Let
(VB, >) be given. The following two statements are
equivalent: (ii) There is
a function u: X—> R, and there is tt E Va, where
tt(x) > Ofor all xE the function:
U: VBVB —» IR defined by U(v) := xex u(x) [ *LpeP olptt*
v] (x) represents Note that for the ap's of (ii), ap E [o,l] for all pEP, and ZpG/ > ap -1. So, Theorem 2 gives a high degree of regularity in the choice of irv from Theorem 1. A person fulfilling and A5 (and thereby A4) behaves as if he assigns to each belief function v a number, which is the expected von Neumann-Morgenstern utility of a ttv in the core of v, where ttv is computed as a weighted average of all the vertices rr* vof the core of v, and the weights ap are independent ofv. The regularity comes from the last part: The weighted avarage used is the same for all belief functions. As an illustration consider Figure 2, where the cores of two different belief functions v and w have been drawn, fully and dotted, respectively. The vertices of the two cores are connected two by two by arrows. Connected vertices are given the same weights in the computations of itv and rrw. The lower, right vertex of C(w) is given weight equal to the sum of the weights of the two lower, right vertices of C(v). If, for instance, ttv is situated close to the vertex arising from the requirements given by v({x2}) and v({xvx2}) then irw is also close to the corresponding vertex. With a representation like that of Theorem 2 the Ellsberg paradox need no longer be a paradox. From the objective evidence, the belief function over colors (states) would naturally be: r\({red}) = 1/3, t]({black}) = t]({green}) = 0, r)({red, black}) = 7]({red, green}) = 1/3, r\({black, green}) = 2/3. The derived belief functions over outcomes are then for each alternative: A: v"( 100) = 1/3, va(0)va(0) = 2/3; B: i/(100) =0, iAO) = 1/3; C: i/(100) = 1/3, i/(0) - 0; D: i/(100) = 2/3, i/(0) - 1/3. The cores of these can be expressed in terms of the probability of winning DKK 100: C(v") = {1/3}, C(vb) = {tt 177< 2/3}, C(\f) = {tt 177> 1/3}, C(vd) = {2/3}. To each core the decision maker associates a specific probability measure. Within the axioms of Theorem 2, this could very well be close to the minimal possible probability of winning 100 (this corresponds to choosing ap >, where/?' = (0,100), close to 1, which is obtained by letting 77^(0) being close to 1 and 77^(100) close to zero), i.e., close to ttvU =1/3, 7rvh =0, ttvC =1/3, ttvj = 2/3. Assuming that w(100) > w(0), calculation of expected utility with respect to these probabilities gives A > B and D > C. ReferencesAnscombe, F. J. and R. J. Aumann. 1963. A Definition of Subjective Probability. The Annals of Mathematical Statistics, 34, 199-205. Dempster, A. P. 1967. Upper and Lower Probabilities by a Multivalued Mapping. Annals of Mathematical Statistics, 325-339. Ellsberg, D.
1961. Risk, Ambiguity, and the Gilboa, I. 1967. Expected Utility with Purely Subjective Non-Additive Probabilities. Journal of Mathematical Economics, 16, 65-88. Gilboa, I. and D. Schmeidler. 1989. Maxmin Expected Utility with Non-Unique Prior. Journal of Mathematical Economics, 18, 141-153." Hendon, E., H. J. Jacobsen, B. Sloth and T. Tranæs. 1991. Expected Utility under Uncerntainty. Universitet of Copenhagen. Herstein, I. N.
and J. Milnor. 1953. An Axiomatic Jaffray, J. Y.
1989. Application of Linear Kalai, E. and D.
Samet. 1988. Weighted Savage, L. J.
1954. The Foundations of Statistics. Schmeidler, D.
1989. Subjective Probability Shafer, G. 1976.
A Mathematical Theory of Shapley, L. S.
1971. Cores of Convex Games. Vind, K. 1991.
Independent Preferences. von Neumann, J.
and O. Morgenstern. 1947. Wakker, P. P.
1986. Representations of Choice |