References
Texbooks
- Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S.
Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- High-Dimensional Probability, An Introduction with Applications in Data
Science, by R. Vershynin, Cambridge University Press, 2018, available here.
Mon Aug 30
|
A classical reference on the concentration of well-behaved functions of independent random variables is
- A new look at independence, by M. Talagrand, Annals of Probbaility, 24(1): 1-34, 1996.
References for improved Chernoff bounds for Bernoulli random variables:
- Check out the Wikipedia page.
-
A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b, Information and
Processing Letters, 33(6), 305--308, 1990.
- Chapter 4 of the book Probability and Computing: Randomized Algorithms and
Probabilistic Analysis, by M. Mitzenmacher and E. Upfal, Cambridge University
Press, 2005.
- The Probabilistic Method, 3rd Edition, by N. Alon and J. H. Spencer, Wiley,
2008, Appendix A.1.
Improvement of Hoeffding's inequality by Berend and Kantorivich:
- On the concentration of the missing mass, by D. Berend and A.
Kntorovich, Electron. Commun. Probab. 18 (3), 1–7, 2013.
- Section 2.2.4 in monograph on concentration inequalities by M. Raginski.
The original proof of Hoeffding inequality is here. Compare to the modern, slick proof in Lemma 2.2 of [BLM].
Wed Sep 1
|
For an example of the improvement afforded by Bernstein versus Hoeffding, see
Theorem 7.1 of
-
Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, Harro Walk (2002). A
Distribution-Free Theory of Nonparametric Regression, Springer.
available here.
By the way, this is an excellent book.
Wed Sep 8
|
The book Concentration Inequalities for Sums and Martingales by B. Bercu, B. Delyon and E. Rio (2015) contains a many sharp calculations and bounds.
References on the JL Lemma:
- Database friendly random projections: Johnson-Lindenstrauss with binary coins, by D. Achlioptas,
Journal of Computer and System Sciences 66 (2003) 671687.
- An Elementary Proof of a Theorem of Johnson and Lindenstrauss, by S. Dasgupta and A. Gupta, 2002.
- Section 1.2 of the book The Random Projection Method by S. Vempala, AMS, 2004
- Simple Analysis of Sparse, Sign-Consistent JL, by M. Jagadeesan, 2017, arXiv:1708.02966
- Optimality of the Johnson-Lindenstrauss Lemma, by K.G. Larsen and J. Nelson (2016), arXiv:1609.02094
Mon Sep 13, 15 and 20
|
The book Metric Characterization of Random Variables and Random Processes (2000)
V. V. Buldygin and Yu. V. Kozachenko, provides a great deal of information about sub-Gaussian and sub-Exponential variables (as well as random processes) and Orlicz norms.
To see the equivalent characterizations of sub-Gaussian and sub-exponential random variables, see Theorems 2.5.2 and 2.7.1 in Vershynin's book. In particular, Them 2.8.1. therein is yet another version of Bernstein inequality using Orlicz norm.
The Bernstein-Orlicz norm was introduced in the paper The Bernstein-Orlicz norm and deviation inequalities, by S. van de Geer and J. Lederer. See also the follow up paper by J. Wellner, The Bennett-Orlicz norm.
The properties of sub-Weibull random variables are described in the papers:
Mon Sep 29
|
The use of Donsker-Varadhan variational formula for the KL divergence in adaptive data analysis was introduced in the paper:
- Controlling Bias in Adaptive Data Analysis Using Information Theory, by
Daniel Russo, James Zou Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1232-1240, 2016.
Follow-up contributions mentioned in class:
- Information-theoretic analysis of generalization capability of learning algorithms, by Aolin Xu, Maxim Raginsky,
NIPS 2017
- Dependence Measures Bounding the Exploration Bias for General Measurements, by Jiantao Jiao, Yanjun Han, Tsachy Weissman, 2016, arXiv:1612.05845
These arguments were further developed to yield tighter generalization bounds based on chaining mutual information here:
- Chaining Mutual Information and Tightening Generalization Bounds, by Amir R. Asadi1, Emmanuel Abbe1 and Sergio Verdu', NeurIPS 2018
To see the effect of data adaptivity on something as simple as the sample mean, check out
- Are sample means in multi-armed bandits positively or negatively biased? by Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo, NeurIPS 2018
- On the bias, risk and consistency of sample means in multi-armed bandits, by Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo, 2019, arXiv:1902.00746
- On conditional versus marginal bias in multi-armed bandits, by Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo, 2020, ICML 2020
|