Reinforcement Learning Theory


Credit Hours:


Learning Objective:

A student who successfully fulfills the course requirements will have demonstrated an ability to:

  • Explain different problem formulations for reinforcement learning
  • Apply various algorithmic solutions to a wide range of sequential decision-making problems
  • Analyze the performance capabilities and limitations of different algorithm for sequential decision making
  • An ability to conduct a research project by collaborating with one or more partners and write a scientific report of the research findings


This course introduces the foundations and he recent advances of reinforcement learning, an area of machine learning closely tied to optimal control that studies sequential decision-making under uncertainty. This course aims to create a deep understanding of the theoretical and algorithmic foundations of reinforcement learning while discussing the practical considerations and various extensions of reinforcement learning.  

Topics Covered:

Week Lecture Topics
1 Introduction, motivation, overview of relevant background
2 Dynamic programming and policy evaluation
3 Policy iteration and value iteration
4 Monte Carlo and temporal difference methods
5 Computational complexity and statistical limits
6 Linear quadratic regulators (LQR) and optimal control
7 Optimal control for nonlinear systems (Iterative LQR)
8 Prediction, estimation, and Kalman filtering
9 Model-based and model-free reinforcement learning
10 Approximate policy iteration and deep Q-learning
11 Conservative policy iteration and trust region methods
12 Stochastic gradient descent and policy gradient
13 Exploration in reinforcement learning and multi-armed bandits
14 Partially observable Markov decision processes and risk-averse reinforcement learning
15 Inverse reinforcement learning, meta-learning, transfer learning, and multi-agent reinforcement learning



Undergraduate understanding of linear algebra, probability, calculus 

Web Address:





  1. Bandit Algorithms, Lattimore, Tor; Szepesvari, Csaba, Cambridge University Press, 2020
  2. Dynamic Programming and Optimal Control, Bertsekas, Dimitri P., Athena Scientific, 2011
  3. Foundations of Deep Reinforcement Learning, Graesser, Laura; Keng, Wah Loon, Addision-Wesley Professional, 2019
  4. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Puterman, Martin L., John Wiley & Sons, 2014
  5. Neuro-dynamic Programming, Bertsekas, Dimitri P.; Tsisiklis, John N., Athena Scientific, 1996
  6. Reinforcement Learning: An Introduction, Sutton, Richard S.; Barto, Andrew G., MIT Press, 2018
  7. Reinforcement Learning: Theory and Algorithms, Agarwal, Alekh; Jiang, Nan; Kakade, Sham M.; Sun, Wen, 2019 

ProEd Minimum Requirements: