TY - JOUR AU - Dubhashi, Devdatt P. AU - Ranjan, Desh PY - 1996/01/25 Y2 - 2024/03/29 TI - Balls and Bins: A Study in Negative Dependence JF - BRICS Report Series JA - BRICS VL - 3 IS - 25 SE - Articles DO - 10.7146/brics.v3i25.20006 UR - https://tidsskrift.dk/brics/article/view/20006 SP - AB - This paper investigates the notion of negative dependence amongst random variables and attempts to advocate its use as a simple and unifying paradigm for<br />the analysis of random structures and algorithms.<br />The assumption of independence between random variables is often very convenient for the several reasons. Firstly, it makes analyses and calculations much simpler. Secondly, one has at hand a whole array of powerful mathematical concepts and tools from classical probability theory for the analysis, such as laws of large numbers, central limit theorems and large deviation bounds which are usually derived under the assumption of independence. Unfortunately, the analysis of most randomized algorithms involves random variables that are not independent. In this case, classical tools from standard probability<br />theory like large deviation theorems, that are valid under the assumption of independence between the random variables involved, cannot be used as such. It is<br />then necessary to determine under what conditions of dependence one can still use the classical tools.<br />It has been observed before [32, 33, 38, 8], that in some situations, even though the variables involved are not independent, one can still apply some of the standard<br />tools that are valid for independent variables (directly or in suitably modified form), provided that the variables are dependent in specific ways. Unfortunately, it<br />appears that in most cases somewhat ad hoc strategems have been devised, tailored to the specific situation at hand, and that a unifying underlying theory that delves<br />deeper into the nature of dependence amongst the variables involved is lacking. A frequently occurring scenario underlying the analysis of many randomised<br />algorithms and processes involves random variables that are, intuitively, dependent in the following negative way: if one subset of the variables is "high" then a disjoint<br />subset of the variables is "low". In this paper, we bring to the forefront and systematize some precise notions of negative dependence in the literature, analyse<br />their properties, compare them relative to each other, and illustrate them with several applications.<br />One specific paradigm involving negative dependence is the classical "balls and bins" experiment. Suppose we throw m balls into n bins independently at random.<br />For i in [n], let Bi be the random variable denoting the number of balls in the ith bin. We will often refer to these variables as occupancy numbers. This is a<br />classical probabilistic paradigm [16, 22, 26] (see also [31, sec. 3.1]) that underlies the<br />analysis of many probabilistic algorithms and processes. In the case when the balls<br />are identical, this gives rise to the well-known multinomial distribution [16, sec VI.9]: there are m repeated independent trials (balls) where each trial (ball) can result in one of the outcomes E1, ..., En (bins). The probability of the realisation of event Ei is pi for i in [n] for each trial. (Of course the probabilities are subject to the condition<br />Sum_i pi = 1.) Under the multinomial distribution, for any integers<br />m1, ..., mn such that<br />Sum_i mi = m the probability that for each i in [n], event Ei<br />occurs mi times is m!<br />m1! : : :mn!pm1<br />1 : : :pmn<br />n :<br />The balls and bins experiment is a generalisation of the multinomial distribution:<br />in the general case, one can have an arbitrary set of probabilities for each ball: the<br />probability that ball k goes into bin i is pi;k, subject only to the natural restriction<br />that for each ball k,<br />P<br />i pi;k = 1. The joint distribution function correspondingly<br />has a more complicated form.<br />A fundamental natural question of interest is: how are these Bi related? Note<br />that even though the balls are thrown independently of each other, the Bi variables are not independent; in particular, their sum is fixed to m. Intuitively, the Bi's<br />are negatively dependent on each other in the manner described above: if one set<br />of variables is "high", a disjoint set is "low". However, establishing such assertions<br />precisely by a direct calculation from the joint distribution function, though possible<br />in principle, appears to be quite a formidable task, even in the case where the balls are assumed to be identical.<br />One of the major contributions of this paper is establishing that the the Bi are negatively dependent in a very strong sense. In particular, we show that the Bi variables satisfy negative association and negative regression, two strong notions of negative dependence that we define precisely below. All the intuitively obvious assertions of negative dependence in the balls and bins experiment follow as easy corollaries. We illustrate the usefulness of these results by showing how to streamline and simplify many existing probabilistic analyses in literature. ER -