This course is aimed at advanced master and doctorate students who want to conduct independent research on theory for modern machine learning (ML). It teaches standard methods in statistical learning theory commonly used to prove theoretical guarantees for ML algorithms. The knowledge is then applied in independent project work to understand and follow-up on recent theoretical ML results.

Objective

By the end of the semester students should be able to

- understand a good fraction of theory papers published in the typical ML venues. For this purpose, students will learn common mathematical techniques from statistical learning in the first part of the course and apply this knowledge in the project work

- critically examine recently published work in terms of relevance and find impactful (novel) research problems. This will be an integral part of the project work and involves experimental as well as theoretical questions

- outline a possible approach to prove a conjectured theorem by e.g. reducing to more solvable subproblems. This will be practiced in in-person exercises, homeworks and potentially in the final project

- effectively communicate and present the problem motivation, new insights and results to a technical audience. This will be primarily learned via the final presentation and report as well as during peer-grading of peer talks.

Content

This course touches upon foundational methods in statistical learning theory aimed at proving theoretical guarantees for machine learning algorithms. It touches on the following topics - concentration bounds - uniform convergence and empirical process theory - regularization for non-parametric statistics (e.g. in RKHS, neural networks) - high-dimensional learning - computational and statistical learnability (information-theoretic, PAC, SQ) - overparameterized models, implicit bias and regularization

The project work focuses on current theoretical ML research that aims to understand modern phenomena in machine learning, including but not limited to - how overparameterized models generalize (statistically) and converge (computationally) - complexity measures and approximation theoretic properties of randomly initialized and trained neural networks - generalization of robust learning (adversarial or distribution-shift robustness) - private and fair learning

Prerequisites / Notice

Students should have a very strong mathematical background (real analysis, probability theory, linear algebra) and solid knowledge of core concepts in machine learning taught in courses such as “Introduction to Machine Learning”, “Regression”/ “Statistical Modelling”. In addition to these prerequisites, this class requires a high degree of mathematical maturity—including abstract thinking and the ability to understand and write proofs.

Students have usually taken a subset of Fundamentals of Mathematical Statistics, Probabilistic AI, Neural Network Theory, Optimization for Data Science, Advanced ML, Statistical Learning Theory, Probability Theory (D-MATH)