Task 007: Distributed Learning and Inference

Event Date: March 25, 2021
Time: 11:00 am (ET) / 8:00am (PT)
Priority: No
College Calendar: Show
Zikang Xiong, Purdue University
Safe Reinforcement Learning with Shield
ABSTRACT: Although deep reinforcement has achieved promising performance in various domains and tasks, building safe and reliable deep neural network policies still faces numerous challenges. First, the lack of interpretability of neural networks hinders researchers and engineers from making trustful assertions on learned policies' properties. Second, small adversarial perturbations can cause outrageous failures on deep neural networks. In this line of work, we aim to provide a reliable reinforcement pipeline from the perspective of training exploration, post-deployment, and robustness to adversarial attacks. All of these works are based on an online monitoring and repair shielding mechanism. The shielding mechanism harnesses simple linear policies' verifiability and deep neural networks' expressivity to build reliable and performant deep reinforcement learning policies. Our experiments on a large number of benchmarks show that the shielded policies can satisfy safety specifications on various reinforcement learning tasks without performance compromise, provide safe guarantees on training exploration, and are more robust to adversarial attacks. 
 
Bio: Zikang Xiong received his Bachelor of Engineering degree in Software Engineering from University of Electronic Science and Technology of China, Chengdu, China, in 2018. He joined the Computer Science Department at Purdue in 2018 to pursue his Ph.D. degree under the supervision of Prof. Suresh Jagannathan. His research focuses on the attack, defense, and assurance of deep learning, especially deep reinforcement learning.