ICON Seminar in Optimization: Dr. Mohammad Ghavamzadeh (Google Research)

Event Date:

April 15, 2022

Speaker:

Dr. Mohammad Ghavamzadeh

Speaker Affiliation:

Google Research

Time:

2:00pm-3:00pm

Location:

https://purdue-edu.zoom.us/j/98949233496?pwd=Vm53YTMvSVE1OS9LYXVTb2EyQWJhUT09

Priority:

College Calendar:

Show

Title: Mirror Descent Policy Optimization

ICON Seminar Series on Learning Meets Control

Zoom link: https://purdue-edu.zoom.us/j/98949233496?pwd=Vm53YTMvSVE1OS9LYXVTb2EyQWJhUT09

Mirror Descent Policy Optimization

Abstract:

Mirror descent, a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in RL. Inspired by these theoretical results, we propose an RL algorithm, called mirror descent policy optimization (MDPO). MDPO iteratively updates the policy by approximately solving a trust-region problem, whose objective function consists of two terms: a linearization of the standard RL objective and a proximity term that restricts two consecutive policies to be close to each other. We derive on-policy and off-policy variants of MDPO. We highlight the connections between on-policy MDPO and two popular trust-region RL algorithms: TRPO and PPO, and show that MDPO can be an excellent alternative to these popular algorithms. We then show how the popular SAC algorithm can be derived by slight modifications of off-policy MDPO. Overall, MDPO is derived from optimization principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO and PPO, and on-par with SAC in a number of continuous control tasks.

Bio:

Mohammad Ghavamzadeh received a Ph.D. degree from UMass Amherst in 2005. He was a postdoctoral fellow at UAlberta from 2005 to 2008. He was a permanent researcher at INRIA from 2008 to 2013. He was the recipient of the "INRIA award for scientific excellence" in 2011, and obtained his Habilitation in 2014. Since 2013, he has been a senior researcher at Adobe and FAIR, and now a senior staff research scientist at Google. He has published over 100 refereed papers in major machine learning, AI, and control journals and conferences. He has co-chaired more than 10 workshops and tutorials at NeurIPS, ICML, and AAAI. His research has been mainly focused on the areas of reinforcement learning, bandit algorithms, and recommendation systems.

Seminar Video:

2022-04-15 14:00:00 2022-04-15 15:00:00 America/Indiana/Indianapolis ICON Seminar in Optimization: Dr. Mohammad Ghavamzadeh (Google Research) Title: Mirror Descent Policy Optimization https://purdue-edu.zoom.us/j/98949233496?pwd=Vm53YTMvSVE1OS9LYXVTb2EyQWJhUT09