2022-02-15 16:30:00 2022-02-15 17:30:00 America/Indiana/Indianapolis IE SPRING SEMINAR The role of lookahead in reinforcement learning with function approximation R. Srikant, Fredrick G. & Elizabeth H. Nearing Endowed Professor Electrical and Computer Engineering and Co-Director, C3.ai Digital Transformation Institute University of Illinois at Urbana-Champaign bit.ly/PurdueIE_Srikant

February 15, 2022

IE SPRING SEMINAR
The role of lookahead in reinforcement learning with function approximation

Event Date: February 15, 2022
Time: 4:30 pm EST
Location: bit.ly/PurdueIE_Srikant
Priority: No
School or Program: Industrial Engineering
College Calendar: Show
R. Srikant, Fredrick G. & Elizabeth H. Nearing Endowed Professor; Electrical and Computer Engineering and Co-Director, C3.ai Digital Transformation Institute; University of Illinois at Urbana-Champaign
R. Srikant, Fredrick G. & Elizabeth H. Nearing Endowed Professor Electrical and Computer Engineering and Co-Director, C3.ai Digital Transformation Institute University of Illinois at Urbana-Champaign

ABSTRACT

When the sizes of the state and action spaces are large, solving MDPs can be computationally prohibitive even if the probability transition matrix is known. So, in practice, a number of techniques are used to approximately solve the dynamic programming problem, including lookahead, approximate policy evaluation using an m- step return, and function approximation. In a recent paper, (Efroni et al. 2019) studied the impact of lookahead on the convergence rate of approximate dynamic programming. In this talk, we will show that these convergence results change dramatically when function approximation is used in conjunction with lookout and approximate policy evaluation using an m-step return. Specifically, we show that when linear function approximation is used to represent the value function, a certain minimum amount of lookahead and multi-step return is needed for the algorithm to even converge. And when this condition is met, we characterize the finite- time performance of policies obtained using such approximate policy iteration. Our results are presented for two different procedures to compute the function approximation: linear least-squares regression and gradient descent. Joint work with Anna Winnicki, Michael Livesay, and Joseph Lubars.

BIOGRAPHY

R. Srikant is the Co-Director of C3.ai Digital Transformation Institute and the Fredrick G. and Elizabeth H. Nearing Endowed Professor of Electrical and Computer Engineering and Coordinated Science Lab at the University of Illinois at Urbana-Champaign. His research interests include machine learning and communication networks. He is a winner of the ACM SIGMETRICS Achievement Award, the IEEE Koji Kobayashi Computers and Communication Award, and the IEEE INFOCOM Achievement Award. He has won several best paper awards including the Applied Probability Societys Best Publication Award, the IEEE INFOCOM Best Paper Award, and the WiOpt Best Paper Award.