Control and Dynamical Systems Lecture Series: Shalabh Bhatnagar, "Traffic Signal Control"
Wednesday, October 5, 2011
1146 A.V. Williams Building
301 405 6576
Control and Dynamical Systems Lecture Series
Actor-Critic Algorithms with Function Approximation for Constrained Markov Decision Processes: An Application to Traffic Signal Control
Professor, Electrical Engineering
Indian Institute of Science, Bangalore
Actor-critic algorithms are an important subclass of reinforcement learning methods. These are characterized by a parameterization of two entitiesthe actor and the critic. While the critic addresses the problem of prediction, the actor is concerned with control.
Temporal difference (TD) learning is widely recognized as being an efficient method for the problem of prediction. By employing TD-based critics and actors that use policy or natural policy gradients, we present various actor-critic algorithms that boot strap in both the actor and the critic.
Traditional reinforcement learning methods have been geared towards solving stochastic control problems that are usually modelled as Markov decision processes. We present an adaptation of one of the above algorithms with function approximation towards solving a Markov decision process with inequality constraints.
We consider the long-run average cost criterion for both the cost and the constraint functions and employ the Lagrange multiplier approach for finding a constrained optimal policy. We sketch the convergence of our algorithms. Next, we present an application of one of our algorithms on the problem of traffic signal control.
The problem here is to efficiently switch signals at traffic junctions by (a) adaptively finding the right order in which to switch and (b) finding the amount of time that a signal should be green, in order to maximize traffic flows and minimize delays. We observe that our algorithm significantly out-performs fixed timing algorithms as well as Q-learning which is a popular reinforcement learning algorithm.
Shalabh Bhatnagar received his Ph.D in Electrical Engineering from the Indian Institute of Science, Bangalore, in 1998. From 1997 to 2000, he was a Postdoc at the Institute for Systems Research, Maryland. He was also a Postdoc at the Free University, Amsterdam during 2000-2001 and a Visiting Faculty Member at IIT Delhi during July to December 2001. He has been at the Indian Institute of Science, Bangalore from December 2001, where he is currently a Professor. His current research interests include reinforcement learning and stochastic optimization as well as applications in communication networks and vehicular traffic control.