Thompson Sampling for Markov Decision Processes and Related Problems

Event Date: September 23, 2021
Time: 10:00 am
Location: via Zoom
Priority: No
School or Program: Electrical and Computer Engineering
College Calendar: Show
Ashutosh Nayyar
Associate Professor
University of Southern California

Join us online!

Abstract

We consider the problem of learning to control a discrete-time stochastic system with unknown dynamics. We model the system as a Markov decision process (MDP) with finite state and action spaces and propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes. At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the optimal stationary policy for the sampled model for the rest of the episode. The duration of each episode is dynamically determined by two stopping criteria. The first stopping criterion controls the growth rate of episode length. The second stopping criterion happens when the number of visits to any state-action pair is doubled. We establish a bound on the expected regret of our algorithm for weakly communicating MDPs. We then consider two multi-agent versions of our problem - one where the agents are cooperative but asymmetrically informed and the other where the agents are adversarial but symmetrically informed.

Bio

Ashutosh Nayyar is an Associate Professor in the Electrical & Computer Engineering department at the University of Southern California. He received his Ph.D. in Electrical Engineering and Computer Science from the University of Michigan, Ann Arbor. He worked as a post-doctoral researcher at the University of Illinois at Urbana-Champaign and at the University of California, Berkeley. His research interests are in decentralized stochastic control, decentralized decision-making in sensing and communication systems, reinforcement learning, game theory, mechanism design and electric energy systems. His recognitions include an IEEE CSS George S. Axelby Outstanding Paper Award and NSF Career Award.

Host
Sumeet Kumar Gupta, guptask@purdue.edu

2021-09-23 11:00:00 2021-09-23 12:00:00 US/East-Indiana Thompson Sampling for Markov Decision Processes and Related Problems Ashutosh Nayyar Associate Professor University of Southern California via Zoom