August 19, 2024

IE 690000: Cutting-Edge Optimization This Fall

IE 690000: Cutting-Edge Optimization This Fall
 
New class IE 690000: TOPICS IN MODERN CONTINUOUS OPTIMIZATION ALGORITHMS, offered this Fall TTh 4:30-5:45 PM. Below, you find the course abstract. 
 
 
IE 690000: 
 
Optimization is ubiquitous, from fitting the GPT model, figuring out an optimal investment portfolio, to training autonomous UAVs and many more. Central to all these exciting applications are the increasingly sophisticated models (e.g. the objective functions for the deep neural network model is non-convex and non-smooth) and their enormous scale (e.g. GPT has over 100 billion parameters to tune and its corpus consists of the entire internet). The speed with which we could solve the underlying optimization problems is the key to their success. This class seeks to illustrate this issue by developing theoretical understandings for the convergence speed and presenting the recent advances in algorithm design.
 
Naturally, one would desire one general algorithm to solve all problems effectively. Unfortunately, such an algorithm has been shown to be impossible. Instead, we must consider some narrower problem class, and exploit the inherent problem structure to design efficient algorithms with sound theoretical convergence guarantees. In this class, we focus on the optimization problem classes fundamental to machine learning and control, that is, stochastic optimization, non-convex optimization, non-smooth optimization, and variational inequality. 
 
Specifically, we plan to cover the following topics.
 
• Quick review of the optimization complexity theory (3 weeks) 
  • The basic terminology: convexity, Lipschitz continuity, smoothness, black box model, oracle complexity, the impossibility result. 
  • The Legendre dual function, Bregman distance function, the mirror descent algorithm. 
  • The accelerated gradient descent method and lower complexity bounds. 
• Stochastic optimization for statistics and machine learning (3 weeks) 
  • The stochastic gradient descent method and the lower complexity bound.
  • The stochastic block coordinate descent algorithms.
  • The finite sum problem, the variance reduction technique. 
• Smooth non-convex optimization (3 weeks) 
  • The stationarity criterion. The gradient descent method.
  • The inexact proximal point method & the ADAM method.
  • AC-SA method for stochastic nonconvex smooth optimization and some zeroth order method.
• Non-smooth non-convex optimization (3 weeks) 
  • The descent principle.
  • The Clark sub-differential. The weakly convex problem (DC problem). 
  • The piecewise smooth problem. 
  • The Goldstein sub-differential. The INGD method, its stochastic variants and lower complexity bound. 
• Variational Inequality (3 weeks) 
  • Min-max optimization, connection to games, the OGDA algorithm 
  • The operator extrapolation algorithm and its application to policy evaluation in reinforcement learning. 
  • The policy gradient method.
The class meets TTh 4:30 Pm – 5:45 PM.