Control and Dynamical Systems Lecture Series: Shalabh Bhatnagar, "Traffic Signal Control"

Wednesday, October 5, 2011
1:30 p.m.
1146 A.V. Williams Building
Pamela White
301 405 6576

Control and Dynamical Systems Lecture Series

Actor-Critic Algorithms with Function Approximation for Constrained Markov Decision Processes: An Application to Traffic Signal Control

Shalabh Bhatnagar
Professor, Electrical Engineering
Indian Institute of Science, Bangalore

Actor-critic algorithms are an important subclass of reinforcement learning
 methods. These are characterized by a parameterization of two entities—the 
actor and the critic. While the critic addresses the problem of prediction, the
 actor is concerned with control.

Temporal difference (TD) learning is widely recognized as being an efficient method for the problem of prediction. By 
employing TD-based critics and actors that use policy or natural policy gradients, we present various actor-critic algorithms that boot strap in both the actor and the 

Traditional reinforcement learning methods have been geared towards 
solving stochastic control problems that are usually modelled as Markov decision 
processes. We present an adaptation of one of the above algorithms with function approximation towards solving a Markov decision process with inequality constraints.

We consider the long-run average cost criterion for both the
 cost and the constraint functions and employ the Lagrange multiplier approach
for finding a constrained optimal policy. We sketch the convergence of our
 algorithms. Next, we present an application of one of our algorithms on the
 problem of traffic signal control.

The problem here is to efficiently switch
 signals at traffic junctions by (a) adaptively finding the right order in which to
 switch and (b) finding the amount of time that a signal should be green,
in order to maximize traffic flows and minimize delays. We observe that 
our algorithm significantly out-performs fixed timing algorithms as well 
as Q-learning which is a popular reinforcement learning algorithm.

Shalabh Bhatnagar received his Ph.D in Electrical Engineering from the Indian Institute of Science, Bangalore, in 1998. From 1997 to 2000, he was a Postdoc at the Institute for Systems Research, Maryland. He was also a Postdoc at the Free University, Amsterdam during 2000-2001 and a Visiting Faculty Member at IIT Delhi during July to December 2001.
He has been at the Indian Institute of Science, Bangalore from December 2001,
where he is currently a Professor. His current research interests include reinforcement learning and stochastic optimization as well as applications in communication networks and vehicular traffic control.

Audience: Public  Campus  Clark School  Graduate  Undergraduate  Faculty  Staff  Post-Docs  Alumni  Corporate 

remind we with google calendar


January 2021

27 28 29 30 31 1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31 1 2 3 4 5 6
Submit an Event