Dissertation Talk with Brijen Thananjeyan: Safe Reinforcement Learning Using Learned Safe Sets; 5:00 PM, Tuesday, April 26

April 26, 2022

Title: Safe Reinforcement Learning Using Learned Safe Sets
Speaker: Brijen Thananjeyan
Advisors: Ken Goldberg, Joseph E. Gonzalez

Date: Tuesday, April 26, 2022
Time: 5:00 – 6:00 pm PDT

This event is hybrid, held in-person and virtually.
Location (in person): 8034 Berkeley Way West
Location (zoom): https://berkeley.zoom.us/j/94000313493

Meeting ID: 940 0031 3493

Abstract: Reinforcement learning is an increasingly popular framework that enables robots to learn online to perform tasks from prior experience in environments where dynamics or shaped reward functions are challenging to model. However, because this requires robots to sample trajectories under significant dynamical uncertainty, the robot may perform unsafe maneuvers during online exploration, leading to irrecoverable states and possible damage to the robot and/or environment. Safe reinforcement learning is a field with a rich history that studies how to reduce the number and magnitude of unsafe behaviors during learning. Safe reinforcement learning is challenging, because it requires limiting exploration to provide safety, but enabling sufficient exploration to maximize the task reward function. Algorithms frequently draw inspiration from methods in control theory, constrained optimization, and online learning to enable robots to adaptively balance task-driven exploration and safety based on prior experience.

In this talk, I will present a set of safe reinforcement learning algorithms that maintain subsets of the state space where safety is probable under the current policy. The algorithms leverage these safe sets in different ways to promote safety during online exploration. The first part of the talk covers a class of algorithms that requires the robot to maintain a conservative safe set of states from which it has already completed the task. As long as the robot approximately maintains the ability to return to the safe set, the robot can explore outside the safe set and iteratively expand it. This talk briefly presents strong theoretical guarantees for this class of algorithms under known but stochastic, nonlinear dynamics. The second part presents another class of algorithms that maintains a much larger safe set based on the probability of the robot committing unsafe behaviors. The robot uses the boundary of this set to determine whether it should focus on task-driven exploration or safety recovery maneuvers. The final part of this talk covers an algorithm that uses policy uncertainty to implicitly model safety and request human interventions for corrective feedback. This talk concludes with a commentary on lessons learned and future research.