References
Recommended Books
- Statistics for High-Dimensional Data: Methods, Theory and
Applications, by P. Buhlman and S. van de Geer, Springer, 2011.
- Statistical Learning with Sparsity: The Lasso and Generalizations, by
T. Hastie, R. Tibshirani and M Wainwright, Chapman & Hall, 2015.
- Introduction to High-Dimensional Statistics, by C. Giraud, Chapman &
Hall, 2015.
- Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S.
Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- Rigollet, P. (2017) High-Dimensional Statistics - Lecture Notes
Lecture
Notes for the MIT
course 18.S997.
- High-Dimensional Probability, An Introduction with Applications in Data
Science, by R. Vershynin, 2018, available here.
- Probability in High Dimension, 2016, by R. VCan Handel, 2016, available here.
Tue, Jan 15
|
To read more about what I referred to as the "master theorem on the asymptotics
of parametric models" see these notes by Jon Wellner. In particular,
I highly recommend looking at the notes he made for
the sequence of three
classes on theoretical statistics he has been teaching at the University
of Washington. Also, look at lectures of April 24 and April 26 of the course 36-752, from Spring 2018, where this "master theorem on the asymptotics
of parametric models" is proved with a minor correction.
Parameter consistency and central limit theorems for models with increasing
dimension d (but still d < n):
- Rinaldo, A., G'Sell, M. and Wasserman, L. (2019+).
Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free
Inference, arxiv
- Wasserman, L, Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for
estimating undirected graphs, Electronic Journal of Statistics, 8(1),
1188-1224.
- Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a
diverging number of parameters, the Annals of Statistics, 32(3),
928-961.
- Portnoy, S. (1984). Asymptotic Behavior of M-Estimators of p
Regression,
Parameters when p^2/n is Large. I. Consistency, the Annals of
Statistics, 12(4), 1298--1309.
- Portnoy, S. (1985). Asymptotic Behavior of M Estimators of p Regression Parameters
when p^2/n is Large; II. Normal Approximation, the Annals of
Statistics, 13(4), 1403-1417.
- Portnoy, S. (1988). Asymptotic Behavior of Likelihood Methods
for Exponential Families when the Number of Parameters Tends to
Infinity, tha Annals of Statistics, 16(1), 356-366.
Some central limit theorem results in increasing dimension:
- Chernozhukov, V., Chetverikov, D. and Kato, K. (2016). Central
Limit Theorems and Bootstrap in High Dimensions, arxiv
- Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on
dimension, Journal of Statistical Planning and Inference, 113,
385-402.
- Portnoy, S. (1986). On the central limit theorem in R p when
$p \rightarrow \infty$, Probability Theory and Related Fields,
73(4), 571-583.
Thu, Jan 17
|
To see more about concentration in high-dimensions, see
- Ball, K. (1997). An Elementary Introduction to Modern Convex Geometry, pdf.
- S. Artstein-Avidan, A. Giannopoulos, V. Milman, Asymptotic geometric analysis. Part I. Mathematical Surveys and Monographs, 202. American Mathematical Society, Providence, RI, 2015.
Thu, Jan 24
|
Some references to concentration inequalities:
- Concentration Inequalities: A Nonasymptotic Theory of Independence, by S.
Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- Concentration Inequalities and Model Selection, by P. Massart, Springer Lecture
Notes in Mathematics, vol 1605, 2007.
- The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.
- Concentration of Measure for the Analysis of Randomized Algorithms, by D.P.
Dubhashi and A, Panconesi, Cambridge University Press, 2012.
- R. Vershynin, Introduction to the non-asymptotic analysis of random
matrices. In: Compressed Sensing: Theory and Applications, eds. Yonina Eldar and
Gitta Kutyniok. Cambridge University Press
- This set of notes by Dacvid Pollard.
- Chapter 7 of the book: S. Foucart, H. Rauhut, Holger A mathematical introduction to compressive sensing. Applied
and Numerical Harmonic Analysis. Birkhauser/Springer, New York, 2013.
For a comprehensive treatment of sub-gaussian variables and processes (and more)
see:
- Metric Characterization of Random Variables and Random Processes, by V. V.
Buldygin, AMS, 2000.
Finally, here is the traditional bound on the mgf of a centered bounded random
variable (due to Hoeffding), implying that bounded centered variables are
sub-Guassian. It should be compared to the proof given in class.
Tue Jan 29
|
References for Chernoff bounds for Bernoulli (and their multiplicative forms):
- Check out the Wikipedia page.
-
A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b, Information and
Processing Letters, 33(6), 305--308, 1990.
- Chapter 4 of the book Probability and Computing: Randomized Algorithms and
Probabilistic Analysis, by M. Mitzenmacher and E. Upfal, Cambridge University
Press, 2005.
- The Probabilistic Method, 3rd Edition, by N. Alon and J. H. Spencer, Wiley,
2008, Appendix A.1.
Improvement of Hoeffding's inequality for Bernoulli sums by Berend and Kantorivich:
- On the concentration of the missing mass, by D. Berend and A.
Kntorovich, Electron. Commun. Probab. 18 (3), 1–7, 2013.
- Section 2.2.4 in Raginski's monograph (see references at the
top).
Example of how the relative or multiplicative version of Chernoff
bounds will lead to substantial improvements:
-
Minimax-optimal classification with dyadic decision trees, by
C. Scott and R. Nowak, iEEE Transaction on Information Theory,
52(4), 1335-1353.
Tue Feb 5
|
For an example of the improvement afforded by Bernstein versus Hoeffding, see
Theorem 7.1 of
-
Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, Harro Walk (2002). A
Distribution-Free Theory of Nonparametric Regression, Springer.
available here.
By the way, this is an excellent book.
For yet another example of how Bernstein's inequality is preferable to Hoeffding,
see Lemma 13 in
- Minimax Rates for Homology Inference, by S. Balakrishnan, A.
Rinaldo, D. Sheehy, A. Singh and L. Wasserman, 2011, AISTATS.
For sharp tail bounds for chi-squared see:
- Lemma 1 in Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic
functional by model selection, Annals of Statistics, 28(5), 1302-1338.
For a more detailed treatment of sub-exponential variables and sharp
calculations for the corresponding tail bounds see:
- Section 2.3 and exercise 2.8 in Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S.
Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
For classic proofs of Hoeffding, Bennet and Bernstein, see, e.g.,
- Chernoff, Hoeffding’s and Bennett’s Inequalities, a write-up by
Jimmy Jin, James Wilson and Andrew Nobel. pdf>
Tue Feb 12
|
For some refinement of the bounded difference inequality and applications,
see:
- Sason. I. (2011). On Refined Versions of the Azuma-Hoeffding
Inequality with Applications in Information Theory,
arxiv.1111.1977
For a comprehensive treatment of density estimation under the L1 norm see
the book
see:
- Devroy, G. and Lugosi, G. (2001). Combinatorial Methods in Density
Estimation. Springer.
Tue Feb 26
|
For matrix estimation in the operator norm depending on the effective dimension,
see
- Florentina Bunea and Luo Xiao (2015). On the sample covariance matrix
estimator of reduced effective rank population matrices, with
applications to fPCA, Bernoulli 21(2), 1200–1230.
For a treatment of the matrix calculus concepts needed for proving matrix
concentration inequalities (namely operator monotone and convex matrix
functions), see:
- R. Bhatia. Matrix Analysis. Number 169 in Graduate Texts in Mathematics. Springer, Berlin, 1997.
- R. Bhatia. Positive Definite Matrices. Princeton Univ. Press, Princeton, NJ, 2007.
Thu Feb 28
|
To read up about matrix concentration inequalities, I recommend:
- Tropp, J. (2012). User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Vol. 12, num. 4, pp. 389-434, 2012.
- Tropp, J. (2015). An Introduction to Matrix Concentration Inequalities, Found. Trends Mach. Learning, Vol. 8, num. 1-2, pp. 1-230
- Daniel Hsu, Sham M. Kakade, Tong Zhang (2011).
Dimension-free tail inequalities for sums of random
matrices, Electron. Commun. Probab. 17(14), 1–13.
Tue Mar 5
|
To see how Matrix Bernstein inequality can be used in the study of random
graphs, see Tropp's monograph and this readable reference:
- Fan Chung and Mary Radcliffe (2011). On the Spectra of General Random
Graphs, Electronic Journal of Combinatorics 18(1).
To see how Matrix Bernstein inequality can be used to analyze the
performance of spectral clustering for the purpose of community recovery
under a stochastic block model, see this old failed NIPS
submission (in
particular, the appendix).
Tue Mar 19
|
To read about ridge regression and lasso-type estimators an eaerly, good reference is
- Fu, W. and Knight, K. (2000). Asymptotics for lasso-type
estimators, The Annals of Statistics, 8(5), 1356-1378.
A nice reference on ridge and least squares regression with random covariate
is
- Daniel Hsu, Sham M. Kakade and Tong Zhang (2014). Random Design
Analysis of Ridge Regression, Foundations of Computational
Mathematics, 14(3), 569-600.
For some recent, very nice work, on the asymptotics of ridge regression using random matrix theory see
- Edgar Dobriban and Stefan Wager (2018), High-dimensional asymptotics of prediction: ridge regression and classification, The Annals of Statistics, 46 (1), pp. 247-279.
Thu Mar 21
|
A highly recommended book dealing extensively with the normal means problem
is
- Ian Johnstone, Gaussian estimation: Sequence and wavelet models
Draft version, August 9, 2017, pdf
About uniqueness of the lasso (and other interesting properties):
- Tibshirani, R. (2013). The lasso problem and uniqueness, EJS, 7,
1456-1490.
Tue Mar 26
|
For further references on rates for the lasso, restricted eigenvalue conditions,
oracle inequalities, etc, see
- Statistics for High-Dimensional Data: Methods, Theory and
Applications, by P. Buhlman and S. van de Geer, Springer, 2011. Chapter 6
and Chapter 7.
- Belloni A., Chernozhukov, D. and Hansen C. (2010) Inference for High-Dimensional Sparse Econometric Models,
Advances in Economics and Econometrics, ES World Congress 2010, arxiv link
- Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009), Simultaneous
analysis of Lasso and Dantzig selector,
Annals of Statistics, 37(4), 1705–1732.
For the use of cross validation in selecting the lasso parameter see:
- Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and
cross-validation,” Proceedings of the 30th International Conference
on Machine Learning, JMLR W&CP, 28. pdf
- Homrighausen, D. and McDonald, D. (2013b). Risk consistency of
cross-validation with Lasso- type procedures.
arxiv:1308.0810.
- Chatterjee, S. and Jafarov, J. (2015). Prediction error of
cross-validated Lasso,
arxiv:1502.06291
- Chetverikov, D. and Liao Z. (2016). On cross-validated Lasso,
arxiv:1605.02214
And for the one standard error rule, which seems to work well in
practice (but apparently has no theoretical justification), see
these lecture by Ryan Tibshirani:
pdf
and pdf.
Tue Apr 9
|
Good references on perturbation theory are
- Stewart and Sun (1990). Matrix Perturbation Theory, Academic Press. (Start with the CS decomposition and the move on to principal angles and then perturbation theory results).
-
Parlett, B.N. (1998). The Symmetric Eigenvalue Problem, Society for Industrial and Applied Mathematics.
The following version of Davis-Kahan is especially useful
A useful variant of the Davis–Kahan theorem for statisticians, by Y. Yu, T.
Wang and R. J. Samworth, Biometrika, 2014), 99(1), 1–9.
- pdf
Tue Apr 16
|
Good modern references on PCA:
- Johnstone, I. and Lu, A. Y. (2009) On Consistency and Sparsity for Principal Components Analysis in High Dimensions, JASA, 104(486): 682–693.
- B. Nadler, Finite Sample Approximation Results for principal component analysis: A matrix perturbation approach, Annals of Statistics, 36(6):2791--2817, 2008.
- Amini, A. and Wainwright, M. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal
components, Annals of Statistics, 37(5B), 2877-2921.
- A. Birnbaum, I.M. Johnstone, B. Nadler and D. Paul, Minimax bounds for sparse PCA with noisy high-dimensional data
the Annals of Statistics, 41(3):1055-1084, 2013.
- Vu, V. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions, Annals of
Statistics, 41(6), 2905-2947.
Thu Apr 18
|
Good references on ULLN:
- Devroye, L., Gyorfi, L. and Lugosi, G. (1997). A Probabilistic Theory of
Pattern Recognition, Springer.
- Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery
Problems, Springer Lecture Notes in Mathematics, 2033.
-
Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, Harro Walk (2002). A
Distribution-Free Theory of Nonparametric Regression, Springer.
|