An Introduction to Decision Making When Uncertainty Is Not Just Risk

Resumé

SUMMARY: An uncertain and not just risky situation may be modeled using so-called belief functions assigning lower probabilities to events, that is, to sets of possible outcomes. In this paper we give an introduction to this concept and a first contribution to modeling decision making when there is uncertainty. We extend the von Neumann- Morgenstern expected utility theory to belief functions.

1. Introduction

An economic decision is a choice among a set of possible alternatives. In some cases alternatives may be equal to the final outcomes, but very often the final outcome depends upon some state of nature which may be unknown when the decision made. For example, a decision maker considers which share to buy in the stock market. The final outcome is what is earned on the transaction (in e.g. DKK), but this may depend upon future oil prices, on who will be the future manager of the firm, and other matters not known when the share is to be bought. Each decision can thus be viewed as a mapping assigning to each state of nature a particular outcome.

In the von Neumann-Morgenstern (1947) expected utility theory, the decision maker holds an exogenous probability distribution over the states of nature, and thereby, for each possible decision, a probability distribution over the final outcomes (a lottery). It is assumed that the individual has a preference relation ordering all lotteries, and that this preference relation fulfills some basic and appealing postulates - axioms. Then it is shown that the preference relation can be represented by a utility function fulfilling the expected utility hypothesis, i.e., the utility of any lottery is the expectation of the utilities of the final outcomes with respect to the probabilities.

In the von Neumann-Morgenstern theory the probabilities over states are exogenou
1. However, in real life decision situations the present evidence consists of certain
statements known to be true and from these the decision maker must make up what he

This paper reports part of the material in Hendon, Jacobsen, Sloth, and Tranæs (1991). We are very grateful David Schmeidler for help and encouragement. Financial support from the Danish Social Sciences Research Council and Economic Department, Tel Aviv University is gratefully acknowledged.

1 .This is what distinguishes it from the contributions of Savage (1954) and Anscombe-Aumann (1963), which arc concerned with deriving subjective probabilities over states from preferences over decisions.

Side 277

believes about the plausibility of different states. Sometimes he is naturally led to a probability distribution over states, e.g. the roulette, but in other cases the information is not sufficiently rich for this. For instance, let there be three possible future managers r, s and t of the firm. No evidence speaks for any particular of them. It is known that r and s, but not t, graduated from the Harvard Business School. The majority of shareholdershave in candidates from Harvard, etc. In this and many other situationsof interest, it seems highly unlikely that the decision maker, from the present evidence, should formulate his beliefs as precise as a probability distribution over states. To phrase it in classic terms: Uncertainty may be of a more fundamental nature than risk, that is, than what is captured by a probability distribution, as advocatedby and Keynes.

A prominent example is the so-called Ellsberg paradox, Ellsberg (1961). The objective, evidence is this: There is an urn containing 150 balls; 50 of these are red, and 100 are either black or green; there is no further information. Individuals are first asked to choose between two bets, A and B, governed by a random draw of one ball. A: DKK 100 are won if the ball is red. B: DKK 100 are won if the ball is black. They are then given two other bets. C: DKK 100 if the ball is red or green. D: DKK 100 if the ball is black or green. A non-negligible proportion of decision makers strictly prefers A to B, and D to C. Assuming that bet / is strictly preferred to bet // if and only if bet / gives DKK 100 with strictly higher probability than bet //, these preferences are inconsistent any additive probability over colors.

Generally, it seems relevant to consider decision environments, where an individual views each possible decision as giving rise to a mapping from states to outcomes but where the individual's assessment of which state will occur is uncertain in a fundamental that cannot be captured by a probability distribution over the states, and where the consequence of each possible decision is therefore something more fundamentally uncertain than a probability distribution over outcomes. As the objects capturing this more basic uncertainty, we use a generalization of probability distributions namely belief Shafer (1976), or lower probability functions, Dempster (1967). Belief functions are formally introduced and interpreted in Section 2 below.

The primitives of the theory presented here are the set of belief functions over a finiteset outcomes and a preference relation on this set. The task will be to derive a utility representation of this preference relation from axioms on it, thereby obtaining some first insight in the way decision makers may deal with uncertainty. Our theory is von Neumann-Morgenstern-like in the sense that the belief functions over states (outcomes) are exogenous. There is also a literature concerned with deriving subjective, non-additive probabilities over states, Schmeidler (1989), Gilboa (1987) and Wakker

Side 278

(1986)2. However, the contribution closest to the present one is the work of Jaffray
(1989).

The axioms we use here are as standard as possible in order not to confuse things. In fact, we are going to use the von Neumann-Morgenstern axioms and then add one which is supposed to be weak. The present paper is thus a first step in representing preferences over belief functions.

2. Belief functions and the problem considered

Consider a finite set of outcomes; X= (x{?.., xn), and denote by xme set of all
subsets of X. An element E of x is referred to as an event.

A probability measure it assigns to each event E a number 7r(E), the probability of
E. A probability measure is thus a function tt : x~* [0, I], fulfilling:

(1)

(2)

Property (2) is called additivity. The set of probability measures onX is denoted VA.
For convenience, write ir(x) for ir({x}) for all x E X. From (2), tt(E) = X xeE tt(x)
for any E.

A belief function v: x —> [0, I], is a generalization of a probability measure obtained by weakening (2). The interpretation of a belief function is that it assigns to each event E, a lower bound v(E), on the likelihood of E, see also Shafer (1976). The weight of the evidence in support of E, is v(E), while the plausibility of E is \~v(E), where E is the complement of E. So, a belief function embodies in this interpretation both a lower and a upper bound on the likelihood of each event, and in this sense it may contain uncertainty to risk: It assigns to each event not just a single number, the probability that event, but an interval, the range of possible probabilities of the event.

As an example consider again the share/manager example of the introduction and assume that our decision maker, to the three possible states r, s, and t, associates the three rates of return x_u x₂, and jc₃ respectively. The fact that he has little faith in any particular of the potential future managers, but high faith in it being either r or s could, for instance, be expressed by the following belief function over outcomes: v(0) = 0, v({xj) = v({x₂}) = v({xj) = 0.1, v({_Xl, x₃}) = v({x₂, x₃}) = 0.2, v({x_u x₂j) = 0.8 v({x_ux₂,xj) - 1.

Which properties should be attributed to a belief function v? One suggestion is that
for any two events E and F, v(EUF) + v(F) - v(EDF). The idea, which we

2. Alternative approaches are Gilboa and Schmeidler (1989) and Vind (1991).

Side 279

will soon make precise, is that going to larger sets can only increase certainty. In the same spirit it could then be suggested that for any three events E, F and G, we should have v(EUFUG) > v(E) + v(F) + v(G) - v(EHF) - v(EDG) - v(FHG) + v(EnFfIG), etc. In Schafer (1976), the following restriction, which is a generalizationof above for two and three events to k events, is introduced as the defining propertyof belief function:

(3)

This property is called £-monotonicity. To understand the content of Æ-monotonicity note that (3) with equality is the usual inclusion-exclusion rule for probability measures follows from (2) by induction. As already noted, a similar implication does not hold for inequalities. The condition of &-monotonicity must therefore be imposed in order to have an analog to the usual inclusion-exclusion rule.

We require Æ-monotonicity for all k, i.e., v: \—> [o,l] is a belief function if it satifies and (3) for any k >2. Denote by VBV^B the set of all belief functions. Since probability are for all k, we have VAV^A C V^B. One can simply think of a belief function vas the vector (v(E))_Eex in IR™, m:= #X. For any two belief functions _]t v2v₂ E V^A, and aE [o,l], the convex combination v= av_x + (\-ol)v2 is defined v(E) = av_{(E) + (\-a)v₂(E), for all EE\. It is easy to verify that VAV^A and VBV^B are convex sets.

We have not yet justified the assumption of Æ-monotonicity for all A: > 2. However, one very natural way to think of an uncertain environment is as given by a so-called massfunction, as in Shafer (1976). A function m. x—» [o,l], is a mass function if m(0) =0, and V*_Eex m (E) =1. The interpretation is that for any event E, m(E) is the weight of evidence in support of E which is additional to the weight already assigned to the proper subsets of E. The fact that m has non-negative values captures the idea that going to larger events can only increase certainty. Think of each possible decision as giving rise to a particular mass function m. The belief in an event F is naturally defined v(F): = V,_E._ECF m(E). Shafer (1976) shows that if v is defined from a mass function in this way, then v is a belief function, and conversely, any belief function is given by a uniquely determined mass function.

Returning to the share/manager problem we see that v can be derived from the mass function m, where m({xj) - m({x₂}) = m({x₃}) = m({x_hx₂,x₃}) =0.1, m ({x\,x₂}) = 0.6, and m({x\,x?,}) = m({x₂,Xi,}) = 0. So most of the evidence weighs in favor of {x_ux₂} reflecting that the "Harvard argument" points to this set.

A particular class of belief functions is the unit belief functions. For any E EX,
E=£o, define vE by: vE(F) = 1 if E CF, and vE(F) = 0 otherwise. This says that the

Side 280

outcome will be an element in E for sure, and nothing else. Then vE is indeed a belief
function: Its mass function is simply m(F) = 0 for all F # E, and m(E) = 1.

Let mv be the unique mass function defining u Then for any F, v(F) = *LELE.ECFmv (E)
=X£ ew'ø! mi/E)vE(F)> so vis a convex combination of the unit belief functions:

(4)

The primitive of our theory is (V^B, >), the set of belief functions over X with a preference > on it, where > should be read "is at least as good as".³ The problem is, from certain axioms on >, to find a representation; a function U: VV^B _-+ IR, such that for all v,wE V^B:v >w <=> U(v) U(w), where Uhas an intuitive interpretation.

From >, define > by: v > w if v > w but not w > v, and define ~ by: v ~ w if v>w and w>v. Given fl, >) there will also be given preferences over the probability in V^A, in particular over the lotteries assigning probability 1 to one outcome, and zero to all other outcomes. We write x_{ >x2 if the lottery giving x_x for sure is at least as good as the lottery giving x2x₂ for sure. Without loss of generality we assume that jc, > x2x₂ > .... > x_n.

A simple version of the so-called Mixture Set Theorem, Herstein and Milnor
(1953), will be of great use. Let Mbea convex subset of IRA, k E IN, and let >bea
relation on M. Define > and ~ from >as above.

Al. (Weak order). >on Mis complete and transitive.

A2. (Independence). For all mx, m2, w3w3 EM, and a E ]0,l]: If mx > m2, then

aw, + (\-a)m3 > am2 + (\-a)m3.

A3. (Continuity). For all m{, m2, m3m3 EM, such that mx > m2m2 > m3m3 there are a,

j3e[o,l], such that: am,, +(\-a)m3 > m2m2 > (3mi+(\-/3)mi.

Mixture Set Theorem, MST. The following two statements are equivalent:
(i) (M,>)fulfillsA\-A3.

(ii) There is an affine function U:M^> IR4, such that U represents >. Further, ifW

is another affine representation of>, then U'= aU+b, for some a, b E IR, a > 0.

Applying the MST to (VA, >) yields the von Neumann-Morgenstern theory of expectedutility:
affinity of U, one gets that for any rr E VA, U(tt) = U(^"=l it-, e,

3. Let a decision be a function /: S —> X, from a finite set of states to X. Further, let uncertainty with respect to states be described by the belief function 17: .(/ —> [o,l], where .7 is the set of subsets of S. Derive x—> [o,l] in the natural way: v(E) = r\(f (E)) for all EE\- Then vis a belief function on X. This justifies that we start right off with belief functions on X.

4. The function U is affine if U(ani\ + (\-a)m-> ) = aU(m\ )+(l a) U(m~, ). for all ni\ . m-, E M. a E [o.l].

Side 281

= i!!=] TTjUfej), where e_{ is the /'th unit vector, that is the lottery giving x_t for sure. Defining u(x_t) := Uie^, we have: U(tt) = HH_xx _e _x u(x)tt(x). The utility assigned to a lottery ir, is the expected value, with respect to 77, of the utilities assigned to the sure outcomes.

The core5, C(v), of a belief function vis the set of probability measures which do
not contradict v:

(5)

From Shapley (1971), for any 2-monotone v, C(v) is non-empty. So, any belief function
a non-empty core, and the core of a probability measure rr is {tt}.

A function/? from {1, ... n} onto X, gives a particular sequence of the outcomes; p (i)
is the outcome on place /, pA(x) is the place given to x. The set of all n\ permutations is
P. For any vin VB, and any pinP,we define the probability measure tt* v by:

(6)

To illustrate consider the simple permutation/?'^ ~ x_i- Then tt*,_v is the probability measure constructed from v by first giving to jc,, the best outcome, the least probability that can be given according to v. The second best outcome x2x₂ is then given the least probability it can be given according to v and what has already been given to x_h etc.

Shapley (1971) shows that for all 2-monotone v, the set of all vertices (corners) of
C(v) is {tt* v EVa Ip E P}, and C(v) is the convex hull of this set.

An example will clarify these concepts. Consider again the belief function of the share/manager example. Its core is illustrated in Figure 1. The triangle represents the set of all probability distributions over fjc,, x₂, x₃}, with ir(x_x) measured along the TTfXi^-axis and so forth. From v^x-x}) = 0.1, we obtain from (5) the restriction tt(x₃) > 0.1. From v({x_x,x₂}) = 0.8, tt(xx) + tt(x₂)> 0.8, and hence tt(x₃)< 0.2. Continuing like this the core of v appears as the shaded area. The points tt* v are easily computed. For instance, for/? = (x₂, jc₃, x_{), it follows from (6), that tt* v^(x2) = 0.1, then tt*_pJxi) = 0.2 - 0.1 = 0.1, and finally rr^jxj = 1 - 0.2 = 0.8, so tt*_1v = (0.8,0.1,0.1), which is one of the vertices of the core. For this particular v, there are different giving rise to the same it* v. With three outcomes, the core may have up to six vertices.

5. The core is a concept from cooperative game theory. Those familiar with this branch will realize that a belief function v: x —> [o>l]> has exactly the structure of a characteristic function of a game with side payment, X would then be the set of players, and x me set of coalitions.

Side 282

Figure I. The core in the share/manager example

3. Representation theorems: The nature of (V^B, >)

The set of belief functions is convex, so assuming Al-A3 for (V^B, >) would, directly from the MST, imply the existence of an affine representing function U: VBV^B —> 1R (and vice versa). Using (4) and affinity, Ucan be written: U(v) =U( Zf^vø/ mv(E) ve) = Y.eg_x\io!x _\i0! mv(E)U(v_E). This observation is parallel to the von Neumann-Morgenstern representation for probability measures, where now U(v_E) is the counterpart of u(x). In so far as one has good intuition for U(v_E) - like one has in the von Neumann-Morgenstern for u(x) - we have now an interesting representation. However, we do not think that the utility of ending in set E for sure but knowing nothing else, is as interpretable as the utility of getting outcome x for sure. Further, assuming Al-A3 for (V^B, >) implies assuming them for (V^A, >), so preferences over lotteries can still be represented using a von Neumann-Morgenstern utility on X. We would like to have the utility function Uon VBV^B expressed in terms of u. To obtain this we must add an axiome ⁶

6. Jaffray (1989) also builds on the above observation. It is when it conies to "the additional axiom" that the present paper differs from that of Jaffray.

Side 283

A4. (Non-extreme attitude towards uncertainty). For any E E ;f\{o}( there are
tte,tte G VA, such that tte (E) = rrE(E) =1, and tte > ve>tte.

We consider A4 to be very weak. First note that A4 is an axiom only on the unit belief A unit belief function v_E, contains the information that the outcome will be in E for sure and nothing else. The set of probability measures giving something in E for sure is {it EVa I tt(E) =£ xeE tt(x) = I}, which is exactly C(v_E). The elements this set are ranked by >. The content of A4 is just that v_E itself is no better than the very best element and no worse than the very worst element of this set. In particular, since A4 is made only on unit belief functions, tte and7r_£ may be the worst and the best pure outcome of E that is, we could have tte, tte EE, and^ >x >tt_e for all x E E.

Theorem 1 below is a rather easy consequence of the MST and A4. For formal
proofs the reader is referred to Hendon, Jacobsen, Sloth, and Tranæs (1991).

Theorem 1. Let (VB, >) be given. The following two statements are equivalent:
(i) (VB, >)fulfillsAl-A4.
(ii) There is afunction u: X—> !R, and for each vE VB, there is aprobability measure
v G C(v), where tt considered as a fimetion from VBVB to VAVA is ajfine, such
that U: VBVB —> iR, defined by U(v) := Z.v&v u(x)vv(x) represents >. Further, the
function u is unique up to a positive affine transformation.

Theorem 1 is a qualitative statement on the nature of (V^B,>). Given AI-A4, preferences the complicated objects, belief functions, are as (/each belief function is identified a lottery in its core, from which the expected von Neumann-Morgenstern utility in the usual sense is computed, and the so defined numbers rank all the belief functions in preference. Conversely, if the preference has such a representation, A1 -A4 are fulfilled. However, Theorem 1 says little about which element of C(v), v is identified We only know that the function tt from VBV^B to VAV^A is affine, and thereby continuous, v. It is, of course, warranted to say more about the variation of tt_v with u Since Theorem 1 is an if-and-only-if-theorem, we know that this will require stronger axioms. We introduce now an axiom of concistency which is stronger than A4.

Consider the belief function v_x which contains absolutely no information: It says that the outcome will be in X and nothing else. Assume that Al-A2 are fulfilled. Then there is some subset FI_X C V^A, such that v_x ~tt for all tt E FI_x. A natural consistency requirement will now be that there is some ttx E I1_x, with ttx(x) >0 for all x, such that for each subset E, v_E - tte E V^a, where ir_E is derived from ir_A- by Bayes' rule, i.e.,

Side 284

Figure 2.

tte (x) = i:_x(x)/'n_x (E) for x E E.⁷ This is a requirement of internal consistency of the decision maker's preference over unit belief functions. As the appropriate consistency concept we use Bayes' rule which is possible since the requirement is only concerned with unit belief functions. Note that Bayes' rule is not used for updating the uncertain objects, the belief functions; ist is only used as a way of expressing consistency of preferences.

A5. (Consistency). There is ttx E VAVA with iTx(x)> 0 for all x EX, such that vvx~~7rx,
and for each EE vve~tte, where tte is given by tte(x) := ttx (x)/ttx(E), for all x
EE, and tte(x) =0 for jc £E.

It is obvious that A5 is a special case of A4: Just use the ir_E of A5 as both jr_E andir_£ in A4. To prove Theorem 2 below, basically what is needed is Theorem I above and a result from cooperative game theory on Shapley values (see e.g. Kalai and Samet (1988)).

7. The assumption is restrictive, and is made for simplicity. Axiom A5 and Theorem 2 can be generalised to allow for ttx(x) = 0 for some x by using the notion of a weight system, see Hendon, Jacobsen, Sloth, andTranaes(l99l).

Side 285

Theorem 2. Let (VB, >) be given. The following two statements are equivalent:
(0 (VB, >) fulfills Al-A3 and A5.

(ii) There is a function u: X—> R, and there is tt E Va, where tt(x) > Ofor all xE
X, such that if for each p E P:

the function: U: VBVB —» IR defined by U(v) := xex u(x) [ *LpeP olptt* v] (x) represents
Further, the function u is unique up to positive affine transformation.

Note that for the a_p's of (ii), a_p E [o,l] for all pEP, and Z_pG/ _> a_p -1. So, Theorem 2 gives a high degree of regularity in the choice of ir_v from Theorem 1. A person fulfilling and A5 (and thereby A4) behaves as if he assigns to each belief function v a number, which is the expected von Neumann-Morgenstern utility of a tt_v in the core of v, where tt_v is computed as a weighted average of all the vertices rr* vof the core of v, and the weights a_p are independent ofv. The regularity comes from the last part: The weighted avarage used is the same for all belief functions.

As an illustration consider Figure 2, where the cores of two different belief functions v and w have been drawn, fully and dotted, respectively. The vertices of the two cores are connected two by two by arrows. Connected vertices are given the same weights in the computations of it_v and rr_w. The lower, right vertex of C(w) is given weight equal to the sum of the weights of the two lower, right vertices of C(v). If, for instance, tt_v is situated close to the vertex arising from the requirements given by v({x₂}) and v({x_vx₂}) then ir_w is also close to the corresponding vertex.

With a representation like that of Theorem 2 the Ellsberg paradox need no longer be a paradox. From the objective evidence, the belief function over colors (states) would naturally be: r\({red}) = 1/3, t]({black}) = t]({green}) = 0, r)({red, black}) = 7]({red, green}) = 1/3, r\({black, green}) = 2/3. The derived belief functions over outcomes are then for each alternative: A: v"( 100) = 1/3, va(0)v^a(0) = 2/3; B: i/(100) =0, iAO) = 1/3; C: i/(100) = 1/3, i/(0) - 0; D: i/(100) = 2/3, i/(0) - 1/3. The cores of these can be expressed in terms of the probability of winning DKK 100: C(v") = {1/3}, C(v^b) = {tt 177< 2/3}, C(\f) = {tt 177> 1/3}, C(v^d) = {2/3}. To each core the decision maker associates a specific probability measure. Within the axioms of Theorem 2, this could very well be close to the minimal possible probability of winning 100 (this corresponds to choosing a_p _>, where/?' = (0,100), close to 1, which is obtained by letting 77^(0) being close to 1 and 77^(100) close to zero), i.e., close to tt_vU =1/3, 7r_vh =0, tt_vC =1/3, tt_vj = 2/3. Assuming that w(100) > w(0), calculation of expected utility with respect to these probabilities gives A > B and D > C.

References

Anscombe, F. J. and R. J. Aumann. 1963. A Definition of Subjective Probability. The Annals of Mathematical Statistics, 34, 199-205.

Dempster, A. P. 1967. Upper and Lower Probabilities by a Multivalued Mapping. Annals of Mathematical Statistics, 325-339.

Ellsberg, D. 1961. Risk, Ambiguity, and the
Savage Axioms. Quarterly Journal of
Economics, 75, 643-669.

Gilboa, I. 1967. Expected Utility with Purely Subjective Non-Additive Probabilities. Journal of Mathematical Economics, 16, 65-88.

Gilboa, I. and D. Schmeidler. 1989. Maxmin Expected Utility with Non-Unique Prior. Journal of Mathematical Economics, 18, 141-153."

Hendon, E., H. J. Jacobsen, B. Sloth and T. Tranæs. 1991. Expected Utility under Uncerntainty. Universitet of Copenhagen.

Herstein, I. N. and J. Milnor. 1953. An Axiomatic
to Measureable Utility.
Econometrica, 21, 291-297.

Jaffray, J. Y. 1989. Application of Linear
Utility Theory to Belief Functions. Operations
Letters 8, 107-112.

Kalai, E. and D. Samet. 1988. Weighted
Shapley Values. In A. E. Roth (ed.): The
Shapley Value, Cambridge.

Savage, L. J. 1954. The Foundations of Statistics.
York.

Schmeidler, D. 1989. Subjective Probability
and Expected Utility without Additivity,
Econometrica, 57, 571-587.

Shafer, G. 1976. A Mathematical Theory of
Evidence. Princeton.

Shapley, L. S. 1971. Cores of Convex Games.
International Journal of Game Theory, I,
11-26.

Vind, K. 1991. Independent Preferences.
Journal of Mathematical Economics, 20,
119-135.

von Neumann, J. and O. Morgenstern. 1947.
Theory of Games and Economic Behavior,
ed. Princeton.

Wakker, P. P. 1986. Representations of Choice
Situations. PhD-thesis, University of Tilburg.