Control and Dynamical Systems Lecture Series: Shalabh Bhatnagar, "Traffic Signal Control"

Wednesday, October 5, 2011
1:30 p.m.
1146 A.V. Williams Building
Pamela White
301 405 6576
pwhite@umd.edu
http://www.isr.umd.edu/Labs/ISL/CDS_Lecs/cds_upcoming.html

Control and Dynamical Systems Lecture Series

Actor-Critic Algorithms with Function Approximation for Constrained Markov Decision Processes: An Application to Traffic Signal Control

Shalabh Bhatnagar
Professor, Electrical Engineering
Indian Institute of Science, Bangalore

Abstract
Actor-critic algorithms are an important subclass of reinforcement learning  methods. These are characterized by a parameterization of two entitiesthe  actor and the critic. While the critic addresses the problem of prediction, the  actor is concerned with control.

Temporal difference (TD) learning is widely recognized as being an efficient method for the problem of prediction. By  employing TD-based critics and actors that use policy or natural policy gradients, we present various actor-critic algorithms that boot strap in both the actor and the  critic.

Traditional reinforcement learning methods have been geared towards  solving stochastic control problems that are usually modelled as Markov decision  processes. We present an adaptation of one of the above algorithms with function approximation towards solving a Markov decision process with inequality constraints.

We consider the long-run average cost criterion for both the  cost and the constraint functions and employ the Lagrange multiplier approach for finding a constrained optimal policy. We sketch the convergence of our  algorithms. Next, we present an application of one of our algorithms on the  problem of traffic signal control.

The problem here is to efficiently switch  signals at traffic junctions by (a) adaptively finding the right order in which to  switch and (b) finding the amount of time that a signal should be green, in order to maximize traffic flows and minimize delays. We observe that  our algorithm significantly out-performs fixed timing algorithms as well  as Q-learning which is a popular reinforcement learning algorithm.

Biography
Shalabh Bhatnagar received his Ph.D in Electrical Engineering from the Indian Institute of Science, Bangalore, in 1998. From 1997 to 2000, he was a Postdoc at the Institute for Systems Research, Maryland. He was also a Postdoc at the Free University, Amsterdam during 2000-2001 and a Visiting Faculty Member at IIT Delhi during July to December 2001. He has been at the Indian Institute of Science, Bangalore from December 2001, where he is currently a Professor. His current research interests include reinforcement learning and stochastic optimization as well as applications in communication networks and vehicular traffic control.

Audience: Public Campus Clark School Graduate Undergraduate Faculty Staff Post-Docs Alumni Corporate

Browse All Events

February 2025

SU	MO	TU	WE	TH	FR	SA
26	27	28	29	30	31	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	1

Submit an Event

SU	MO	TU	WE	TH	FR	SA
26	27	28	29	30	31	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	1

SU	MO	TU	WE	TH	FR	SA
26	27	28	29	30	31	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	1

SU	MO	TU	WE	TH	FR	SA
26	27	28	29	30	31	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	1