Optimization for Deep Learning


Credit Hours:



This course discusses the optimization algorithms that have been the engine that powered the recent rise of machine learning (ML) and deep learning (DL). The "learning" in ML and DL typically boils down to non-convex optimization problems with high-dimensional parameter spaces and objective functions involving millions of terms. Given that the success of ML models is reliant on finding solutions to these problems efficiently, the needs of ML are different than those of other fields in terms of optimization. This course introduces students to the theoretical principles behind stochastic, gradient-based algorithms for DL as well as considerations such as adaptivity, generalization, distributed learning, and non-convex loss surfaces typically present in modern DL problems. 

Topics Covered:

Week(s) Major Topics
4 Introduction and foundation: ML basics; Stochastic gradient descent; Smooth, nonconvex problems; Expected and high probability results; Gradient-free SGD; Beyond first-order results; Stochastic lower-bounds
2.5 Deep learning training techniques and insights: Deep learning architectures; Backpropagation; Automatic differentiation and computation graphs; Initialization and normalization methods; Learning rate tuning methods; Regularization
2.5 Deep learning training algorithms; Adaptive methods; Momentum; Variance reduction; Distributed DL; Decentralized SGD and Consensus
2.5 Special topics: Compression; Privacy-preserving ML; Overparameterized models and Interpolation; Neural Tangent Kernel; Implicit bias of SGD
2 Special topics II: Robustness; Generalization error in DL; Double descent; Min-max optimization and GANs



Undergraduate probability, calculus, and linear algebra. Basic knowledge of computer vision, NLP, machine learning, and statistics is helpful but not required. 

Applied / Theory:


Web Address:







  1. Multiple books and articles available freely online will be referenced; details provided in Brightspace.