Benchmarks for reinforcement learning in mixed-autonomy traffic

David Schonenberg Deep Learning, Reinforcement Learning

We release new benchmarks in the use of deep reinforcement learning (RL) to create controllers for mixed-autonomy traffic, where connected and autonomous vehicles (CAVs) interact with human drivers and infrastructure. Benchmarks, such as Mujoco or the Arcade Learning Environment, have spurred new research by enabling researchers to effectively compare their results so that they can focus on algorithmic improvements and control techniques rather than system design. To promote similar advances in traffic control via RL, we propose four benchmarks, based on three new traffic scenarios, illustrating distinct reinforcement learning problems with applications to mixed-autonomy traffic. We provide an introduction to each control problem, an overview of their MDP structures, and preliminary performance results from commonly used RL algorithms. For the purpose …

Authors: Eugene Vinitsky, Aboudy Kriedieh, Luc Le Flem, Nishant Kheterpal, Kathy Jang, Cathy Wu, Richard Liaw, Eric Liang, Alexandre Bayen

SQL Query Optimization Meets Deep Reinforcement Learning

Zongheng Yang blog, Database Systems, Deep Learning, Reinforcement Learning, Systems 0 Comments

We show that deep reinforcement learning is successful at optimizing SQL joins, a problem studied for decades in the database community.  Further, on large joins, we show that this technique executes up to 10x faster than classical dynamic programs and 10,000x faster than exhaustive enumeration.  This blog post introduces the problem and summarizes our key technique; details can be found in our latest preprint, Learning to Optimize Join Queries With Deep Reinforcement Learning. SQL query optimization has been studied in the database community for almost 40 years, dating all the way back from System R’s classical dynamic programming approach.  Central to query optimization is the problem of join ordering.  Despite the problem’s rich history, there is still a continuous stream …

Parametrized Hierarchical Procedures for Neural Programming

Roy Fox Deep Learning, Human in the Loop, Theoretical ML

Parametrized Hierarchical Procedures (PHP) represent a program as a hierarchy of procedures that call each other, each implemented by a neural network. We develop an algorithm for training PHPs from a set of supervisor demonstrations, only few of which are annotated with the internal call structure.

Authors: Roy Fox, Eui Chul (Richard) Shin, Sanjay Krishnan, Kenneth Goldberg, Dawn song, Ion Stoica

MLPerf: SPEC for ML

David Patterson Deep Learning, News, Open Source, Optimization, Reinforcement Learning, Systems, Uncategorized 0 Comments

The RISE Lab at UC Berkeley today joins Baidu, Google, Harvard University, and Stanford University to announce a new benchmark suite for machine learning called MLPerf at the O’Reilly AI conference in New York City (see https://mlperf.org/). The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance eventually for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users. Historical Inspiration. We are motivated in part by the System Performance Evaluation Cooperative (SPEC) benchmark for general-purpose computing that drove rapid, …

Distributed Policy Optimizers for Scalable and Reproducible Deep RL

Eric Liang blog, Deep Learning, Distributed Systems, Open Source, Ray, Reinforcement Learning 0 Comments

In this blog post we introduce Ray RLlib, an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed policy optimizers that make it easy to use a variety of training strategies with existing reinforcement learning algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms and libraries. We discuss in more detail the design and performance of policy optimizers in the RLlib paper. What’s next for RLlib In the near term we plan to continue building out RLlib’s set of policy optimizers and algorithms. Our aim is for RLlib to serve …

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

Alexey Tumanov blog, Deep Learning, Reinforcement Learning, Systems

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging …

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

Aaditya Ramdas Deep Learning

We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Second, the MMD can be used to evaluate the performance of a generative model, by testing the model’s samples against a reference data set. In the …