CCSP Seminar: Alec Koppel (ARL) "Policy Search for Reinforcement Learnng in Continuous Spaces"
Communication, Control and Signal Processing Seminar
Policy Search for Reinforcement Learning in Continuous Spaces: Improved Limits and Reduced Variance
U.S. Army Research Laboratory
Reinforcement Learning (RL) is a form of stochastic adaptive control in which one seeks to estimate parameters of a controller without having access to a dynamics model. RL has gained popularity in recent years beginning with the smashing success of AlphaGo besting the world champion in Go during summer 2016. However, the recent empirical successes of RL have been called into question due to their irreproducibility and high variance across different training runs. Motivated by this gap, we'll spotlight recent efforts to solidify theoretical understanding of the rate analysis and limiting properties of policy gradient methods in continuous Markov Decision Problems from a non-convex optimization perspective. Moreover, we design modified step-size rules that yield convergence to approximate local extrema, motivating reward-reshaping via nonconvex optimization. We'll then discuss a modification of the Policy Gradient Theorem that yields provably lower variance policy search directions, and algorithms based upon which yield algorithms with reduced variance. These results provide a conceptual framework for the future design of stable RL tools with lower variance.
Alec Koppel began as a Research Scientist at the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate in September of 2017. He completed his Master's degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. He is also a participant in the Science, Mathematics, and Research for Transformation (SMART) Scholarship Program sponsored by the American Society of Engineering Education. Before coming to Penn, he completed his Master's degree in Systems Science and Mathematics and Bachelor's Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. His research interests are in the areas of signal processing, optimization and learning theory. His current work focuses on optimization and learning methods for streaming data applications, with an emphasis on problems arising in autonomous systems. He co-authored a paper selected as a Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers.