Ph.D. Research Proposal Exam: Erfaun Noorani

Friday, December 3, 2021
8:00 a.m.
AVW 1146
Maria Hoo
301 405 3681
mch@umd.edu

Ph.D. Research Proposal Exam

 

Name: Erfaun Noorani

 

Committee:

Professor John S. Baras (Chair)

Professor Eyad H. Abed

Professor Micheal C. Fu

Date/time: Friday, December 3rd, 2021 at 8 am

 

Location: AVW 1146

 

Title: Robust Reinforcement Learning via Risk-sensitivity


Abstract:

The recent impressive performance of Reinforcement Learning (RL) algorithms in the domain of video games, starting with the DQN algorithm, as well as promising applications of RL systems in multiple domains, such as protein folding, robotics, traffic control, resource management, finance, interactive education, and health care, to name a few, have brought RL to the forefront of current research. Current Reinforcement Learning has several weaknesses, with the most well-known ones being low generalizability and brittleness (i.e., non-robustness), which have hindered the adoption of such RL systems for critical applications, especially high-stakes and safety-critical real-world applications. 

 

Robust properties of Risk-sensitive RL algorithms, coupled with the improved generalizability, are a strong indication that risk-sensitizing RL algorithms can pave the way to so-called “real-world” RL. We propose to develop Single-agent Reinforcement learning (RL), Multi-Agent Reinforcement Learning (MARL), and Human Multi-Agent Reinforcement Learning (H-MARL) systems that are generic, provide performance guarantees and can generalize-reason-improve in complex and unknown task environments. 

 

In our preliminary work, we established the connection between Risk-sensitive RL, (Distributionally) Robust RL, and Regularized RL objectives (such as entropy and KL-regularized RL) and hence a host of well-known RL algorithms, such as Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), and Maximum A Posteriori Policy Optimization (MPO). Such equivalences (I) allow to understand several well-known RL algorithms from a risk minimization perspective and hence offers a taxonomy of RL algorithms based on such perspective and (II) analytically establish the robustness and generalizability properties of Risk-sensitive Reinforcement Learning, which in turn, provides a theoretical justification for the robust performance of a host of well-known RL algorithms. These further motivate risk-sensitizing current Risk-neutral RL algorithms. We also derived Policy Gradient Theorem for the Risk-sensitive Control “exponential of integral” criteria and proposed a risk-sensitive Monte Carlo policy gradient algorithm, as a risk-sensitive generalization of Policy Gradient Algorithm REINFORCE. Our simulations, together with our theoretical analysis, show that the use of Risk-sensitive RL, with an appropriately chosen risk parameter, not only results in a risk-sensitive policy but also reduces variance during the learning process and accelerates learning, which in turn results in a policy with a higher expected return— that is to say, risk-sensitiveness leads to sample efficiency and improved performance. We also explored the use of such Risk-sensitive Policy Gradient algorithms in Independent Multiagent environments. Our simulation results show the agent’s risk-attitudes influence coordination and collaboration by influencing the agents’ learning dynamics and, if appropriately chosen, can lead to efficient learning of Hicks optimal policies. This suggests that risk-sensitive agents could better coordinate and collaborate, which results in a better performance in multi-agent task environments. 

 

We propose to further extend such risk-sensitive approaches to RL algorithms with Temporal Logic constraints and develop risk-sensitive algorithms for (Human) Multi-agent environments with Temporal Logic constraints. Such development is a step forward for enabling the adoption of RL systems for safety-critical high-impact real-world applications.

 


 

Audience: Graduate  Faculty 

remind we with google calendar

 

April 2024

SU MO TU WE TH FR SA
31 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 1 2 3 4
Submit an Event