The following paper from the RISELab has won the “Notable Paper Award” at the 2019 Artificial Intelligence and Statistics (AISTATS) conference: A Swiss Army Infinitesimal Jackknife; Giordano, R., Stephenson, W., Liu, R., Jordan, M. I. & Broderick, T. (2019); In K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of the Twenty-Second Conference on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan.

## An Open Source Tool for Scaling Multi-Agent Reinforcement Learning

We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and how we designed for it in RLlib. Our goal is to enable multi-agent RL across a range of use cases, from leveraging existing single-agent algorithms to training with custom algorithms at large scale.

## Benchmarks for reinforcement learning in mixed-autonomy traffic

We release new benchmarks in the use of deep reinforcement learning (RL) to create controllers for mixed-autonomy traffic, where connected and autonomous vehicles (CAVs) interact with human drivers and infrastructure. Benchmarks, such as Mujoco or the Arcade Learning Environment, have spurred new research by enabling researchers to effectively compare their results so that they can focus on algorithmic improvements and control techniques rather than system design. To promote similar advances in traffic control via RL, we propose four benchmarks, based on three new traffic scenarios, illustrating distinct reinforcement learning problems with applications to mixed-autonomy traffic. We provide an introduction to each control problem, an overview of their MDP structures, and preliminary performance results from commonly used RL algorithms. For the purpose …

**Authors:** Eugene Vinitsky, Aboudy Kriedieh, Luc Le Flem, Nishant Kheterpal, Kathy Jang, Cathy Wu, Richard Liaw, Eric Liang, Alexandre Bayen

## SQL Query Optimization Meets Deep Reinforcement Learning

We show that deep reinforcement learning is successful at optimizing SQL joins, a problem studied for decades in the database community. Further, on large joins, we show that this technique executes up to 10x faster than classical dynamic programs and 10,000x faster than exhaustive enumeration. This blog post introduces the problem and summarizes our key technique; details can be found in our latest preprint, Learning to Optimize Join Queries With Deep Reinforcement Learning. SQL query optimization has been studied in the database community for almost 40 years, dating all the way back from System R’s classical dynamic programming approach. Central to query optimization is the problem of join ordering. Despite the problem’s rich history, there is still a continuous stream …

## Parametrized Hierarchical Procedures for Neural Programming

Parametrized Hierarchical Procedures (PHP) represent a program as a hierarchy of procedures that call each other, each implemented by a neural network. We develop an algorithm for training PHPs from a set of supervisor demonstrations, only few of which are annotated with the internal call structure.

**Authors:** Roy Fox, Eui Chul (Richard) Shin, Sanjay Krishnan, Kenneth Goldberg, Dawn song, Ion Stoica

## MLPerf: SPEC for ML

The RISE Lab at UC Berkeley today joins Baidu, Google, Harvard University, and Stanford University to announce a new benchmark suite for machine learning called MLPerf at the O’Reilly AI conference in New York City (see https://mlperf.org/). The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance eventually for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users. Historical Inspiration. We are motivated in part by the System Performance Evaluation Cooperative (SPEC) benchmark for general-purpose computing that drove rapid, …

## Distributed Policy Optimizers for Scalable and Reproducible Deep RL

In this blog post we introduce Ray RLlib, an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed policy optimizers that make it easy to use a variety of training strategies with existing reinforcement learning algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms and libraries. We discuss in more detail the design and performance of policy optimizers in the RLlib paper. What’s next for RLlib In the near term we plan to continue building out RLlib’s set of policy optimizers and algorithms. Our aim is for RLlib to serve …

## DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

DDCO learns form demonstrations continuous control skills parametrized by deep networks, their termination condition, and high-level policies to use them.

**Authors:** Sanjay Krishnan, Roy Fox, Ion Stoica, Kenneth Goldberg

## Finite-Size Corrections and Likelihood Ratio Fluctuations in the Spiked Wigner Model

In this paper we study principal components analysis in the regime of high dimensionality and high noise. Our model of the problem is a rank-one deformation of a Wigner matrix where the signal-to-noise ratio (SNR) is of constant order, and we are interested in the fundamental limits of detection of the spike. Our main goal is to gain a fine understanding of the asymptotics for the log-likelihood ratio process, also known as the free energy, as a function of the SNR. Our main results are twofold. We first prove that the free energy has a finite-size correction to its limit—the replica-symmetric formula—which we explicitly compute. This provides a formula for the Kullback-Leibler divergence between the planted and null models. Second, …

**Authors:** Ahmed El Alaoui, Florent Krzakala, Michael Jordan

## Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging …

## Random Projection Design for Scalable Implicit Smoothing of Randomly Observed Stochastic Processes

Standard methods for multi-variate time series analysis are hampered by sampling at random timestamps, long range dependencies , and the scale of the data. In this paper we present a novel estimator for cross-covariance of randomly observed time series which identifies the dynamics of an unobserved stochastic process. We analyze the statistical properties of our estimator without the assumption that observation timestamps are independent from the process of interest and show that our solution does not suffer from the corresponding issues affecting standard estimators for cross-covariance. We implement and evaluate our statistically sound and scalable approach in the distributed setting using Apache Spark and demonstrate its ability to identify interactions between processes on simulations and financial data with tens of millions of samples. Pdf: Aistats_camera_ready

**Authors:** , Joseph Gonzalez, Evan Sparks, Alexandre M. Bayen

## Decoding from Pooled data: Phase Transitions of Message Passing

We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations m tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. We initiate the analysis of the multi-dimensional SE dynamics by proving their convergence to a fixed point, along with some further properties of the iterates. The analysis reveals sharp phase …

**Authors:** Aaditya Ramdas, Ahmed El Alaoui, Michael Jordan, Florent Krzakala, Lenka Zdeborova

## Decoding from Pooled data: Sharp Information-Theoretic Bounds

Consider a population consisting of n individuals, each of whom has one of d types (e.g. their blood type, in which case d=4). We are allowed to query this database by specifying a subset of the population, and in response we observe a noiseless histogram (a d-dimensional vector of counts) of types of the pooled individuals. This measurement model arises in practical situations such as pooling of genetic data and may also be motivated by privacy considerations. We are interested in the number of queries one needs to unambiguously determine the type of each individual. In this paper, we study this information-theoretic question under the random, dense setting where in each query, a random subset of individuals of size proportional …

**Authors:** Ahmed El Alaoui, Aaditya Ramdas, Michael Jordan, Florent Krzakala, Lenka Zdeborova

## Universality of Mallows’ and degeneracy of Kendall’s kernels for rankings

Kernel methods provide an attractive framework for aggregating and learning from ranking data, and so understanding the fundamental properties of kernels over permutations is a question of broad interest. We provide a detailed analysis of the Fourier spectra of the standard Kendall and Mallows kernels, and a new class of polynomial-type kernels. We prove that the Kendall kernel has exactly two irreducible representations at which the Fourier transform is non-zero, and moreover, the associated matrices are rank one. This implies that the Kendall kernel is nearly degenerate, with limited expressive and discriminative power. In sharp contrast, we prove that the Fourier transform of the Mallows kernel is a strictly positive definite matrix at all irreducible representations. This property guarantees that …

**Authors:** Horia Mania, Aaditya Ramdas, Martin J. Wainwright, Michael Jordan, Benjamin Recht