SDS 387: Linear Models

Alessandro (Ale) Rinaldo - Fall, 2024

SDS 387 is an intermediate graduate course in theoretical statistics for PhD students, covering two separate but interrelated topics: (i) stochastic convergence and (ii) linear regression modeling. The material and style of the course will skew towards the mathematical and theoretical aspects of common models and methods, in order to provide a foundation for those who wish to pursue research in statistical methods and theory. This is not an applied regression analysis course.

Syllabus: Syllabus

Lectures: Tuesday and Thursday, 9:00am - 10:30am, PMA 5.112

TA: Khai Nguyen, khainb@utexas.edu - Office hours: Thursday, 1:30pm - 2:30pm, GDC 7.418 (Poisson Bowl)

Ale's Office hours: by appointment

Homework submission and solutions: use Canvas

	Due date
Homework 1	September 17
Homework 2	October 3
Final project proposal	October 12
Homework 3	October 17
Homework 4	November 14

Tuesday, August 27

Lecture 1: Introduction and course logistics. Deterministic convergence and convergence with probability one.

Thursday, August 29

Lecture 2: Lim sup and lim inf of events. Borel Cantelli Lemmas. Convergence in probability and comparison with convergence with probability one. Law of large numbers. Glivenko Cantelli Lemma.
References:

See Ferguson's book, chapters 1, 2 and 4.
For a proof of Glivenko-Cantelli's Lemma see Theorem 19.1 of van der Vaart's book.
A nice webpage summarizing the different modes of stochastic convergence and providing some good examples to illustrate their differences.

Tuesday, September 3

Lecture 3: Glivenko Cantelli Theorem, First Borel Cantelli Lemma, more on convergence in probability. For the Glivenko Cantelli Theorem, see Theorem 19.1 in van der Vaart's book.

Thursday, September 5

Lecture 4: Lp convergence, Minkowski, Holder and Jensen inequalities. Relations between Lp convergence and convergence in probability and with probability one. C.d.f.'s in multivariate settings.

Tuesday, September 10

Lecture 5: Convergence in distribution. Relation with other forms of convergence. Marginal vs joint convergence in distribution. Portmanteau theorem. For the proof of the claim that convergence in probability implies convergence in distribution, see page 330 of Billingsley's book Probability and Measure.

Thusday, September 12

Lecture 6: Portmantreau Theorem, Continuous Mapping Theorem, characteristics functions and Continuity Theorem, Cramer-Wald device. I suggest reading Chapter 3 of Ferguson's book (in particuar, Theorem 3(e) has a neat proof).

Tuesday, September 17

Lecture 7: Slutsky's theorem, more on convergence in distribution. Big-oh and little-oh notation.

Thursday, September 19

Lecture 8: More on big-oh and little-oh notation. CLT for i.i.d. variables using characteristic functions. Triangular arrays, Lindeberg Feller and Lyapunov conditions.

Tuesday, September 24

Lecture 9: Lindeberg Feller, examples and multivariate extension. Berry-Esseen bounds. A good reference for this lecture and the last is the book Sums of Independent Random Variables, by V.V. Petrov, Springer, 1975. Another classic and good reference is Approximation Theorems of Mathematical Statistics by Serfling, Wiley, 1980.

Thursday, September 27

Lecture 10: Kolmogorov Smirnov, total variation and Wasserstein distances. Theorem 1.1 about Lindeberg approximations for 3-times continuously differentiable functions.

Tuesday, October 1

Lecture 11: Review of linear algebra. See references in the class notes.

Thursday, October 3

Lecture 12: Spectral properties of matrices. Eigendecomposition and singular value decomposition.

Tuesday, October 8

Lecture 13: Projections. Vector and matrix norms.

Tuesday, October 15

Lecture 14: projection of a random variable onto vector space of random variables. Introduction to linear regression modeling. For the next few lectures, I will be following closely the book Learning Theory from First Principles by Francis Bach

Thursday, October 17

Lecture 15: Inference and prediction in linear regression modeling. Projection parameter, prediction risk decomposition.

Thursday, October 24

Lecture 16: Geometric interpretation of the OLS estimator. Gradient descent convergence guarantee for the OLS.

Tuesday, October 29

Lecture 17: Pseudo inverse. Risk decomposition for the estimator of the linear regression parameters for fixed design.

Thursday, October 31

Lecture 18: Gauss Markov Theorem. Ridge regression.

Tuesday, November 5

Lecture 19: Optimal tuning for ridge regression and minimax lower bound for OLS.

Thursday, November 7

Lecture 20: Minimax lower bound for OLS. Consistency of the OLS.

Tuesday, November 12

Lecture 21: Asymptotic normality of the OLS estimator and statistical inference in the fixed-design, well-specified setting

Thursday, November 14

Lecture 22: Random design. Risk formula for the OLS. Projection parameters.

Tuesday, November 19

Lecture 23: Random design. Minimax optimality of the OLS. Exact analysis under Gaussian design. Recommended readings:

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices. Annals of Statistics, 50(4):2157–2178, 2022.
Leo Breiman and David Freedman. How many variables should be entered in a regression equation? J. Amer. Statist. Assoc., 78(381):131–136, 1983.

Thursday, November 21

Lecture 24: The double descend phenomenon. Recommended readings:

Hastie, T., Montanari, A., Rosset, S. and Tibshirani, R. J. (2022). Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2), 949-986.
Belkin, M., Hsu, D. and Xu, J. (202Two models of double descent for weak features (2020). SIAM Journal on Mathematics of Data Science, 2, 4.

Assumption lean regression. Highly recommended reading:

Buja, A., Brown, L., Berk, R., George, E., Pitkin, E., Traskin, M., Zhang, K., Zhao, L. (2019). Models as Approximations I: Consequences Illustrated with Linear Regression, Statistical Science, 34(4), 523-544.

Tuesday, December 3

Lecture 25: Consistency and asymptotic normality of the OLS estimator in assumption lean setting.

Thursday, December 5

Lecture 26: Consistency of the plug-in estimator of the sandwich covariance for the OLS estimator.

High-dimensional generalizations of the results from the last few lectures can be found in:

Kuchibhotla, A., Rinaldo, A. and Wasserman, L. (2021). Berry-Esseen Bounds for Projection Parameters and Partial Correlations with Increasing Dimension, arXiv:2007.09751
Chang, W., Kuchibhotla, A., Rinaldo, A. (2013). Inference for Projection Parameters in Linear Regression: beyond d=o(n1/2), arXiv:2307.00795.

Conditions for consistency and asymptotic normality of the OLS estimator (for a well-specified linear model) were given by

Lai, T. Z. and Wei, C. Z. (1982). Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems, Annals of Statistics, 10(1): 154-166.

Khamaru, K., Deshpande, Y., and Wainwright, M. (2021). Near-optimal inference in adaptive linear regression, arXiv:2107.02266

Here is a simple example of the negative impact of adaptively collected data protocols:

Shin, J., Ramdas, A., and Rinaldo A. (2021). On the Bias, Risk, and Consistency of Sample Means in Multi-armed Bandits, SIAM Journal of Mathematics of Data Science, 3(4).