Aggarwal and team receive best paper award at 2021 NeurIPS

Dr. Washim Uddin Mondal, Postdoctoral Researcher at the Lyles School of Civil Engineering, along with co-authors Satish Ukkusuri, Reilly Professor of Civil Engineering, Vaneet Aggarwal, Associate Professor of Industrial Engineering, and Mridul Agarwal, PhD student in ECE, received the best paper award in the Cooperative AI workshop at the 2021 Conference on Neural Information Processing Systems (NeurIPS) for their paper, "On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)."

Dr. Mondal gave the spotlight presentation at the workshop, held December 13-14th.

The Conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

 

ABSTRACT

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

Uddin Mondal, Washim ;  Agarwal, Mridul ;  Aggarwal, Vaneet ; Ukkusuri, Satish V.

Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of Npop heterogeneous agents that can be segregated into K classes such that the k-th class contains  Nk homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of  (1) joint state and action distributions across all classes,  (2) individual distributions of each class, and  (3) marginal distributions of the entire population. We show that, in these cases, the -class MARL problem can be approximated by MFC with errors given as  eO ( ( (√|X|+√|U|)  / Npop)  Nk),  e( [ |X| + |U| ] ∑k * 1/ Nk) and  e= O ( [ |X|+|U|] * [A/Npop * ∑k∈[K] * N+ B / Npop] ), respectively, where A , B  are some constants and  |X|, |U| are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within  O (ej) error with a sample complexity of  O (ej-3), ∈ {1,2,3}, respectively.  

arXiv:2109.04024

Keywords: Computer Science - Machine Learning; Computer Science - Artificial Intelligence; Computer Science - Computer Science and Game Theory; Computer Science - Multiagent Systems