Lectures

2025 Lectures:

Lecture 1: Introduction to Sequential Learning. (Intro slides here — mathematical content in course notes.)

2023 Lectures:

Lecture 1: Introduction to Sequential Learning. The Halving Algorithm obtains ln(k) Regret when there is a costless action. (Intro slides here — mathematical material in course notes)

Lecture 2: Recap of the halving algorithm, and analysis of the Multiplicative Weights algorithm for sequential decision making (“The Experts Problem”). Video here.

Lecture 3: In this lecture we describe and begin the analysis of Follow the Perturbed Leader: A learning algorithm that can obtain diminishing regret bounds even in very large action spaces, so long as the costs have an underlying linear structure. Video here.

Lecture 4: We recap and finish the analysis of Follow the Perturbed Leader. Video here.

Lecture 5: Follow the Regularized Leader and Online Gradient Descent. (I forgot to press record, so no video, but all of the details we covered are in Sections 1.5 in the notes)

Lecture 6: Online Linear Optimization Implies Online Convex Optimization. Begin Zero-Sum Games. Video here.

Lecture 7: Proving the Minimax Theorem in Zero-Sum Games using Online Convex Optimization. Video here.

Lecture 8: Computing Minimax and Maximin Strategies in Zero Sum Games using Online Convex Optimization and “Value Oracles”. Video here.

Lecture 9: Deriving a No Regret Learning Algorithm (Exponential Weights) from First Principles, Starting from the Minimax Theorem. Part 1. Video here.

Lecture 10: Deriving a No Regret Learning Algorithm (Exponential Weights) from First Principles, Starting from the Minimax Theorem. Part 2. Video here.

Lecture 11: Correlated Equilibrium and its Relationship to Swap Regret. Video here.

Lecture 12: In this lecture we define a very general online multiobjective optimization problem (The “Adversary Moves First Framework”) and give an algorithm that reduces it to a 1-dimensional regret problem. This is a generalization of “Blackwell Approachability”. Video here.

Lecture 13: We finish the multiobjective optimization derivation. Video here.

Lecture 14: We study “subsequence regret”, which asks for regret simultaneously on different subsequences of the data. This covers adaptive regret, groupwise regret, internal and swap regret, and more things. We reduce it to our multiobjective (“Adversary Moves First” Regret) framework. Video here.

Lecture 15: We derive a constructive/closed form algorithm that gets no subsequence regret for any collection of subsequences that are independent of the action chosen by the learner. We describe how to instantiate this algorithm to obtain no “adaptive regret” (no regret on any contiguous subsequence) and no group-wise regret (no regret on subsequences of days associated with contexts that are members of various groups). Video here.

Lecture 16: We derive a closed form algorithm for obtaining general subsequence regret. Rather than solving an abstract minmax equilibrium problem at each round, we see that the structure of the equilibrium just requires that we solve a system of linear equations to find the top eigenvector of a matrix. This derives the “Blum/Mansour” swap regret algorithm from first principles. Video here.

Lecture 17: In this lecture we begin a section of the course in which our goal is to produce predictions that can be simultaniously used by many downstream agents, with different utility functions, to choose how to act, so that they all have strong regret guarantees of various sorts. Video here.

Lecture 18: We take a short detour from our goal of prediction for downstream action to study the problem of predicting an adversarially selected vector valued state such that our predictions have low statistical bias, not just overall, but on multiple possibly intersecting subsequences. We solve this problem (information theoretically if not computationally) by reducing it to multiobjective optimization. Will it be useful for us? Only time will tell! Video here.

Lecture 19: In this lecture we introduce calibration, and show that for any downstream agent, it is an approximate dominant strategy (up to diminishing regret terms) amongst all policies mapping predictions to actions to choose the “best response” policy which treats predictions as if they were correct and best responds to them. Among other things, this implies that the downstream agents will have no swap regret if they best respond to the forecasts. Video here.

Lecture 20: In this lecture we derive an efficient combinatorial online algorithm for (1 dimensional) multicalibration, as an application of our multiobjective optimization framework. Video here.

Lecture 21: In this lecture we show how to efficiently (in time polynomial in d and |E|) make sequential predictions of a d-dimensional vector that are unbiased conditionally on every conditioning event in set E. Video here.

Lecture 22: In this lecture by Natalie Collina, we learn about a pair of brand new swap regret algorithms (https://arxiv.org/abs/2310.19786 and https://arxiv.org/abs/2310.19647) that operate on a different set of principles than the one we already covered in class: they do not need to solve a fixed point problem. They get an exponentially improved dependence on the number of actions k, in exchange for an exponentially worse dependence on the time horizon T. Video here.

Lecture 23: In this lecture we define a small collection of events in terms of an agent’s utility function such that best responding to to forecasts of state that are unbiased subject to these events yields diminishing swap regret. Together with the algorithm we gave in Lecture 21, this yields a computationally efficient algorithm for making forecasts that yield no-swap-regret at the optimal rate for any polynomially sized collection of downstream decision makers. Video here.

Lecture 24: In this lecture we show how (using our unbiased prediction technology) how to efficiently obtain subsequence regret in online combinatorial optimization problems, despite the fact that the action space can be exponentially large. Video here.