Home | Current | Past volumes | About | Login | Notify | Contact | Search
 Probability Surveys > Vol. 4 (2007) open journal systems 


Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws

Alexander V. Gnedin, University Utrecht
Ben Hansen, University of Michigan
Jim Pitman, University of California, Berkeley


Abstract
This paper collects facts about the number of occupied boxes in the classical balls-in-boxes occupancy scheme with infinitely many positive frequencies: equivalently, about the number of species represented in samples from populations with infinitely many species. We present moments of this random variable, discuss asymptotic relations among them and with related random variables, and draw connections with regular variation, which appears in various manifestations.

AMS 2000 subject classifications: Primary 60F05, 60F15; secondary 60C05.

Keywords: occupancy problem, regular variation, asymptotics, poissonization, species sampling.

Creative Common LOGO

Full Text: PDF


Gnedin, Alexander V., Hansen, Ben, Pitman, Jim, Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws, Probability Surveys, 4, (2007), 146-171 (electronic). DOI: 10.1214/07-PS092.

References

[1]   R. Arratia, A. Barbour and S. Tavaré, Logarithmic combinatorial structures: a probabilistic approach, Eur. Math. Soc. Publ. House, Zürich, 2003. MR2032426

[2]   M. Archibald, A. Knopfmacher, H. Prodinger, The number of distinct values in a geometrically distributed sample, Eur. J. Combin. 27 1059-1081, 2006. MR2259938

[3]   R.R. Bahadur, On the number of distinct values in a large sample from an infinite discrete distribution, Proc. Nat. Inst. Sci. India 26A Supp. II: 67-75, 1960. MR0137256

[4]   A.D. Barbour and A.V. Gnedin, Regenerative compositions in the case of slow variation, Stoch. Proc. Appl. 116: 1012-1047, 2006. MR2238612

[5]   J. Berestycki, N. Berestycki and J. Schweinsberg, Beta-coalescents and continuum stable random trees, 2006 (available at arXiv).

[6]   J. Bertoin, Random fragmentation and coagulation processes, Cambridge University Press, 2006. MR2253162

[7]   N.H. Bingham, C.M. Goldie and J.L. Teugels, Regular variation, Cambridge University Press, 1987. MR0898871

[8]   L. Bogachev, A. Gnedin and Y. Yakubovich, On the variance of the number of occupied boxes, Adv. Appl. Math. 2007 (available at arXiv).

[9]   J. Bunge and M. Fitzpatrick, Estimating the number of species: a review, J. Am. Statist. Assoc. 88: 364–373, 1993.

[10]   S. Chaudhuri, R. Motwani and V. Narasayya, Random sampling for histogram construction: how much is enough?, Proc. 1998 ACM SIGMOD Conf. on Management of Data 436–447, ACM Press, N.Y., 1998.

[11]   E. Csaki and A. Foldes, On the first emtpy cell, Studia Sci. Math. Hungar, 11: 373-382, 1976. MR0554583

[12]   M. Dutko, Central limit theorems for infinite urn models, Ann. Probab. 17: 1255-1263, 1989. MR1009456

[13]   B. Eisenberg, G. Stengle and G. Strang, The asymptotic probability of a tie for first place, Ann. Appl. Probab. 3: 731-745, 1993. MR1233622

[14]   W.E.Y. Elliott and R.J. Valenza, And then there were none: winnowing the Shakespeare claimants, Computers and the Humanities 30:191–245, 1996.

[15]   W. Feller, An introduction to probability theory and its applications, vol. II, Wiley, NY, 1971. MR0270403

[16]   D. Freedman, Another note on the Borel-Cantelli lemma and the strong law, with the Poisson approximation as a by-product, Ann. Probab. 1: 910-925, 1973. MR0370711

[17]   S. K. Formanov and A. Asimov, A limit theorem for the separable statistic in a random assignement scheme, J. Math. Sci. 38: 2405-2411, 1987.

[18]   A. Gnedin, The representation of composition structures, Ann. Probab. 25: 1437-1450, 1997. MR1457625

[19]   A. Gnedin, The Bernoulli sieve, Bernoulli 10: 79-96, 2004. MR2044594

[20]   A. Gnedin and J. Pitman, Moments of convex distributions and completely alternating sequences, IMS Lecture Notes (Freedman’s Festschrift), 2007 (available at arXiv).

[21]   A. Gnedin and J. Pitman, Regenerative composition structures, Ann. Probab. 33: 445-479, 2005. MR2122798

[22]   A. Gnedin, J. Pitman and M. Yor, Asymptotic laws for compositions derived from transformed subordinators, Ann. Probab. 34: 468-492, 2006. MR2223948

[23]   A. Gnedin, J. Pitman and M. Yor, Asymptotic laws for regenerative compositions: gamma subordinators and the like, Probab. Theory Related Fields 135: 576-602, 2006. MR2240701

[24]   A. Gnedin and Y. Yakubovich, Recursive partition structures, Ann. Probab. 34: 2203-2218, 2006.

[25]   H.-K. Hwang and S. Janson, Local limit theorems for finite and infinite urn models, 2006 (available at arXiv).

[26]   Ivanov, V.A., Ivchenko, G.I. and Medvedev, Yu.I. Discrete problems in probability theory, J. Math. Sci 31: 2759-2795, 1985. MR0778384

[27]   N.L. Johnson and S. Kotz (1977), Urn models and their application: an approach to modern discrete probability theory. Wiley, New York. MR0488211

[28]   S. Karlin, Central limit theorems for certain infinite urn schemes, J. Math. Mech., 17: 373-401, 1967. MR0216548

[29]   V.F. Kolchin, B.A. Sevast’yanov and V.P. Chistyakov, Random allocations, V.H. Winston & Sons, Washington, D.C.; distributed by Halsted Press (John Wiley & Sons), New York, 1978. MR0471016

[30]   G. Louchard, H. Prodinger and M.D. Ward, The number of distinct values of some multiplicity in sequences of geometrically distributed random variables. 2005 Int. Conf. Anal. Algorithms, DMTCS proc. AD: 231-256, 2005. MR2193122

[31]   V. G. Mikhailov, The central limit theorem for the scheme of independent placements of particles among cells, Trudy Steklov Math. Inst. (MIAN) 157: 138-152, 1981. MR0651763

[32]   J. Pitman, Partition structures derived from Brownian motion and stable subordinators, Bernoulli 3: 79-96, 1997. MR1466546

[33]   J. Pitman, Combinatorial stochastic processes, Springer L. Notes Math. vol. 1875, 2006. MR2245368

[34]   C.J. Skinner and M.J. Elliot, A measure of disclosure risk for microdata, J. Roy. Statist. Soc. B 64:855–867, 2002. MR1979391




Home | Current | Past volumes | About | Login | Notify | Contact | Search

Probability Surveys. ISSN: 1549-5787