Consistent estimation of small masses in feature sampling

School Of Mathematical Sciences

Associated organisational unit

Statistical Artificial Intelligence

Electronic data

JMLR-18-534-3
Accepted author manuscript, 488 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Consistent estimation of small masses in feature sampling. / Battiston, Marco; Ayed, Fadhel; Camerlenghi, Federico et al.
In: Journal of Machine Learning Research, Vol. 22, No. 6, 31.01.2021, p. 1-28.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Battiston, M, Ayed, F, Camerlenghi, F & Favaro, S 2021, 'Consistent estimation of small masses in feature sampling', Journal of Machine Learning Research, vol. 22, no. 6, pp. 1-28. <https://www.jmlr.org/papers/v22/18-534.html>

APA

Battiston, M., Ayed, F., Camerlenghi, F., & Favaro, S. (2021). Consistent estimation of small masses in feature sampling. Journal of Machine Learning Research, 22(6), 1-28. https://www.jmlr.org/papers/v22/18-534.html

Vancouver

Battiston M, Ayed F, Camerlenghi F, Favaro S. Consistent estimation of small masses in feature sampling. Journal of Machine Learning Research. 2021 Jan 31;22(6):1-28.

Author

Battiston, Marco ; Ayed, Fadhel ; Camerlenghi, Federico et al. / Consistent estimation of small masses in feature sampling. In: Journal of Machine Learning Research. 2021 ; Vol. 22, No. 6. pp. 1-28.

Bibtex

@article{f9a8d240ae424bbeaf3e5fbdb6397b01,

title = "Consistent estimation of small masses in feature sampling",

abstract = "Consider an (observable) random sample of size n from an infinite population of individuals, each individual being endowed with a finite set of features from a collection of features (Fj)j≥1 with unknown probabilities (pj)j≥1, i.e., pj is the probability that an individual displays feature Fj. Under this feature sampling framework, in recent years there has been a growing interest in estimating the sum of the probability masses pj's of features observed with frequency r≥0 in the sample, here denoted by Mn,r. This is the natural feature sampling counterpart of the classical problem of estimating small probabilities in the species sampling framework, where each individual is endowed with only one feature (or “species{"}). In this paper we study the problem of consistent estimation of the small mass Mn,r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass Mn,0. Then, we introduce an estimator of Mn,r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator M^n,r of Mn,r which has the same analytic form of the celebrated Good--Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that M^n,r is strongly consistent, in the multiplicative sense, under the assumption that (pj)j≥1 has regularly varying heavy tails.",

author = "Marco Battiston and Fadhel Ayed and Federico Camerlenghi and Stefano Favaro",

year = "2021",

month = jan,

day = "31",

language = "English",

volume = "22",

pages = "1--28",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

number = "6",

}

RIS

TY - JOUR

T1 - Consistent estimation of small masses in feature sampling

AU - Battiston, Marco

AU - Ayed, Fadhel

AU - Camerlenghi, Federico

AU - Favaro, Stefano

PY - 2021/1/31

Y1 - 2021/1/31

N2 - Consider an (observable) random sample of size n from an infinite population of individuals, each individual being endowed with a finite set of features from a collection of features (Fj)j≥1 with unknown probabilities (pj)j≥1, i.e., pj is the probability that an individual displays feature Fj. Under this feature sampling framework, in recent years there has been a growing interest in estimating the sum of the probability masses pj's of features observed with frequency r≥0 in the sample, here denoted by Mn,r. This is the natural feature sampling counterpart of the classical problem of estimating small probabilities in the species sampling framework, where each individual is endowed with only one feature (or “species"). In this paper we study the problem of consistent estimation of the small mass Mn,r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass Mn,0. Then, we introduce an estimator of Mn,r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator M^n,r of Mn,r which has the same analytic form of the celebrated Good--Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that M^n,r is strongly consistent, in the multiplicative sense, under the assumption that (pj)j≥1 has regularly varying heavy tails.

AB - Consider an (observable) random sample of size n from an infinite population of individuals, each individual being endowed with a finite set of features from a collection of features (Fj)j≥1 with unknown probabilities (pj)j≥1, i.e., pj is the probability that an individual displays feature Fj. Under this feature sampling framework, in recent years there has been a growing interest in estimating the sum of the probability masses pj's of features observed with frequency r≥0 in the sample, here denoted by Mn,r. This is the natural feature sampling counterpart of the classical problem of estimating small probabilities in the species sampling framework, where each individual is endowed with only one feature (or “species"). In this paper we study the problem of consistent estimation of the small mass Mn,r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass Mn,0. Then, we introduce an estimator of Mn,r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator M^n,r of Mn,r which has the same analytic form of the celebrated Good--Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that M^n,r is strongly consistent, in the multiplicative sense, under the assumption that (pj)j≥1 has regularly varying heavy tails.

M3 - Journal article

VL - 22

SP - 1

EP - 28

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - 6

ER -

Research

Associated organisational unit

Electronic data

Links