References

Recommended Books

Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011.
Statistical Learning with Sparsity: The Lasso and Generalizations, by T. Hastie, R. Tibshirani and M Wainwright, Chapman & Hall, 2015.
Introduction to High-Dimensional Statistics, by C. Giraud, Chapman & Hall, 2015.
Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
Rigollet, P. (2015) High-Dimensional Statistics - Lecture Notes Lecture Notes for the MIT course 18.S997.
High-Dimensional Probability, An Introduction with Applications in Data Science, by R. Vershynin, 2018, available here.
Probability in High Dimension, 2016, by R. VCan Handel, 2016, available here.

Mon Aug 27

To read more about what I referred to as the "master theorem on the asymptotics of parametric models" see these notes by Jon Wellner. In particular, I highly recommend looking at the notes he made for the sequence of three classes on theoretical statistics he has been teaching at the University of Washington. Also, look at lectures of April 24 and April 26 of the course 36-752, from Spring 2018, where this "master theorem on the asymptotics of parametric models" is proved correctly.

Parameter consistency and central limit theorems for models with increasing dimension d (but still d < n):

Rinaldo, A., G'Sell, M. and Wasserman, L. (2016+). Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference, arxiv
Wasserman, L, Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for estimating undirected graphs, Electronic Journal of Statistics, 8(1), 1188-1224.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters, the Annals of Statistics, 32(3), 928-961.
Portnoy, S. (1984). Asymptotic Behavior of M-Estimators of p Regression, Parameters when p^2/n is Large. I. Consistency, the Annals of Statistics, 12(4), 1298--1309.
Portnoy, S. (1985). Asymptotic Behavior of M Estimators of p Regression Parameters when p^2/n is Large; II. Normal Approximation, the Annals of Statistics, 13(4), 1403-1417.
Portnoy, S. (1988). Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity, tha Annals of Statistics, 16(1), 356-366.

Some central limit theorem results in increasing dimension:

Chernozhukov, V., Chetverikov, D. and Kato, K. (2016). Central Limit Theorems and Bootstrap in High Dimensions, arxiv
Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on dimension, Journal of Statistical Planning and Inference, 113, 385-402.
Portnoy, S. (1986). On the central limit theorem in R p when $p \rightarrow \infty$, Probability Theory and Related Fields, 73(4), 571-583.

Wed Aug 31

Some references to concentration inequalities:

Concentration Inequalities: A Nonasymptotic Theory of Independence, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
Concentration Inequalities and Model Selection, by P. Massart, Springer Lecture Notes in Mathematics, vol 1605, 2007.
The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.
Concentration of Measure for the Analysis of Randomized Algorithms, by D.P. Dubhashi and A, Panconesi, Cambridge University Press, 2012.
High-Dimensional Probability, An Introduction with Applications in Data Science, by R. Vershynin, 2018, available here.
Chapter 2 of a draft monograph by David Pollard on empirrical rocesses.

For a comprehensive treatment of sub-gaussian variables and processes (and more) see:

Metric Characterization of Random Variables and Random Processes, by V. V. Buldygin, AMS, 2000.

Mon Sep 5

Good resources for the properties of subGaussian variables are:

Omar Rivasplata, Subgaussian random variables: An expository note, Sections 1, 2 and 3. pdf
Lecture 6 of the course "Machine learning and appplications", by Z. Harchaui, J. Mairal and J. Salmon, pdf

Here is the traditional bound on the mgf of a centered bounded random variable (due to Hoeffding), implying that bounded centered variables are sub-Guassian. It should be compared to the proof given in class. References for Chernoff bounds for Bernoulli (and their multiplicative forms):

Check out the Wikipedia page.
A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b, Information and Processing Letters, 33(6), 305--308, 1990.
Chapter 4 of the book Probability and Computing: Randomized Algorithms and Probabilistic Analysis, by M. Mitzenmacher and E. Upfal, Cambridge University Press, 2005.
The Probabilistic Method, 3rd Edition, by N. Alon and J. H. Spencer, Wiley, 2008, Appendix A.1.

Improvement of Hoeffding's inequality by Berend and Kantorivich:

On the concentration of the missing mass, by D. Berend and A. Kntorovich, Electron. Commun. Probab. 18 (3), 1–7, 2013.
Section 2.2.4 in Raginski's monograph (see references at the top).

Example of how the relative or multiplicative version of Chernoff bounds will lead to substantial improvements:

Minimax-optimal classification with dyadic decision trees, by C. Scott and R. Nowak, iEEE Transaction on Information Theory, 52(4), 1335-1353.

Mon Sep 17

For an example of the improvement afforded by Bernstein versus Hoeffding, see Theorem 7.1 of

Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, Harro Walk (2002). A Distribution-Free Theory of Nonparametric Regression, Springer.

available here. By the way, this is an excellent book. For sharp tail bounds for chi-squared see:

Lemma 1 in Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28(5), 1302-1338.

For a more detailed treatment of sub-exponential variables and sharp calculations for the corresponding tail bounds see:

Section 2.3 and exercise 2.8 in Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.

For a detailed treatment of Chernoff bounds, see:

Section 2.3 in Concentration Inequalities and Model Selection, by P. Massart, Springer Lecture Notes in Mathematics, vol 1605, 2007.

Mon Sep 24

For some refinement of the bounded difference inequality and applications, see:

Sason. I. (2011). On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory, arxiv.1111.1977

A good referencer on U-statistics:

Lee, A.J. (1990). U-Statistics: Theory and Practice, CRC Press;

For a comprehensive treatment of density estimation under the L1 norm see the book see:

Devroy, G. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer.

Wed Oct 3

For matrix estimation in the operator norm depending on the effective dimension, see

Florentina Bunea and Luo Xiao (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA, Bernoulli 21(2), 1200–1230.

For a treatment of the matrix calculus concepts needed for proving matrix concentration inequalities (namely operator monotone and convex matrix functions), see:

R. Bhatia. Matrix Analysis. Number 169 in Graduate Texts in Mathematics. Springer, Berlin, 1997.
R. Bhatia. Positive Definite Matrices. Princeton Univ. Press, Princeton, NJ, 2007.

To read up about matrix concentration inequalities, I recommend:

Tropp, J. (2012). User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Vol. 12, num. 4, pp. 389-434, 2012.
Tropp, J. (2015). An Introduction to Matrix Concentration Inequalities, Found. Trends Mach. Learning, Vol. 8, num. 1-2, pp. 1-230
Daniel Hsu, Sham M. Kakade, Tong Zhang (2011). Dimension-free tail inequalities for sums of random matrices, Electron. Commun. Probab. 17(14), 1–13.

Mon Oct 8

To see how Matrix Bernstein inequality can be used in the study of random graphs, see Tropp's monograph and this readable reference:

Fan Chung and Mary Radcliffe (2011). On the Spectra of General Random Graphs, Electronic Journal of Combinatorics 18(1).

To see how Matrix Bernstein inequality can be used to analyze the performance of spectral clustering for the purpose of community recovery under a stochastic block model, see this old failed NIPS submission (in particular, the appendix). This is a paper on linear regression every Phd students in statistics (and everyone taking this class) should read:

Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin, Linda Zhao and Kai Zhang (2015). Models as Approximations: A Conspiracy of Random Regressors and Model Deviations Against Classical Inference in Regression, df

A nice reference on ridge and least squares regression with random covariate is

Daniel Hsu, Sham M. Kakade and Tong Zhang (2014). Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, 14(3), 569-600.

A highly recommended book dealing extensively with the normal means problem is

Ian Johnstone, Gaussian estimation: Sequence and wavelet models Draft version, August 9, 2017, pdf

Mon Oct 24

For further references on rates for the lasso, restricted eigenvalue conditions, oracle inequalities, etc, see

Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011. Chapter 6 and Chapter 7.
Belloni A., Chernozhukov, D. and Hansen C. (2010) Inference for High-Dimensional Sparse Econometric Models, Advances in Economics and Econometrics, ES World Congress 2010, arxiv link
Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009), Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, 37(4), 1705–1732.

Wed Oct 31

Good modern references on PCA:

Johnstone, I. and Lu, A. Y. (2009) On Consistency and Sparsity for Principal Components Analysis in High Dimensions, JASA, 104(486): 682–693.
B. Nadler, Finite Sample Approximation Results for principal component analysis: A matrix perturbation approach, Annals of Statistics, 36(6):2791--2817, 2008.
Amini, A. and Wainwright, M. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components, Annals of Statistics, 37(5B), 2877-2921.
A. Birnbaum, I.M. Johnstone, B. Nadler and D. Paul, Minimax bounds for sparse PCA with noisy high-dimensional data the Annals of Statistics, 41(3):1055-1084, 2013.
Vu, V. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions, Annals of Statistics, 41(6), 2905-2947.

Mon Nov 12

A nice tutoerial on spectral clustering:

A tutorial on spectral clustering, by U. von Luxburg, pdf

Mon Nov 26

Good references on ULLNs and classical VC theory:

Devroye, L., Gyorfi, L. and Lugosi, G. (1997). A Probabilistic Theory of Pattern Recognition, Springer. Chapters 12 and 13.
Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer Lecture Notes in Mathematics, 2033.
Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, Harro Walk (2002). A Distribution-Free Theory of Nonparametric Regression, Springer. Chapter 9.

Wed Nov 28

For relative VC deviations see:

M. Anthony and J. Shawe-Taylor, "A result of Vapnik with applica- tions," Discrete Applied Mathematics, vol. 47, pp. 207-217, 1993.
V. N. Vapnik and A. Ya. Chervonenkis, "On the uniform convergence of rel- ative frequencies of events to their probabilities," Theory of Probabil- ity and its Applications, vol. 16, pp. 264-280, 1971.

For Talagrand's inequality, see, e.g.,

Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer Lecture Notes in Mathematics, 2033.
The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.