Blog

Online Foundations of Data Science Course Launches on edX!

UC Berkeley’s pathbreaking entry-level course on the Foundations of Data Science (Data 8) is launching on edX on April 2. This makes the fastest-growing class in UC Berkeley history available to everyone. Foundations of Data Science teaches computational and inferential thinking from the ground up. It covers everything from testing hypotheses, applying statistical inferences, visualizing distributions and drawing conclusions—all while coding in Python and using real world data sets. The cou...

March 28, 2018 | Author: Boban Zarkovich | Post Views: 1437

Distributed Policy Optimizers for Scalable and Reproducible Deep RL

In this blog post we introduce Ray RLlib, an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed policy optimizers that make it easy to use a variety of training strategies with existing reinforcement learning algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms...

March 20, 2018 | Author: Eric Liang | Post Views: 572

Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS 🗺

This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural quest...

March 12, 2018 | Author: Joe Hellerstein | Post Views: 4106

RISECamp Behind the Scenes

  RISECamp was held at UC Berkeley on September 7th and 8th. This post looks behind the scenes at the technical infrastructure used to provide a cloud-hosted cluster for each attendee with ready-to-use Jupyter notebooks requiring only a web browser to access. Background and Requirements RISECamp is the latest in a series of workshops held by RISELab (and its predecessor, AMPLab) showcasing the latest research from the lab. The sessions consist of talks on the latest research systems produce...

November 13, 2017 | Author: Jey Kottalam | Post Views: 100

Fast Python Serialization with Ray and Apache Arrow

This post was originally posted here. Robert Nishihara and Philipp Moritz are graduate students in the RISElab at UC Berkeley. This post elaborates on the integration between Ray and Apache Arrow. The main problem this addresses is data serialization. From Wikipedia, serialization is … the process of translating data structures or object state into a format that can be stored … or transmitted … and reconstructed later (possibly in a different computer environment). Why is any translation n...

October 16, 2017 | Author: Robert Nishihara | Post Views: 194

Ray: 0.2 Release

This was originally posted on the Ray blog. We are pleased to announce the Ray 0.2 release. This release includes the following: substantial performance improvements to the Plasma object store an initial Jupyter notebook based web UI the start of a scalable reinforcement learning library fault tolerance for actors Plasma Since the last release, the Plasma object store has moved out of the Ray codebase and is now being developed as part of Apache Arrow (see the relevant documentation), so th...

October 10, 2017 | Author: Robert Nishihara | Post Views: 104

Low-Latency Model Serving with Clipper

The mission of the RISELab is to develop technologies that enable applications to make low-latency decisions on live data with strong security. One of the first steps towards achieving this goal is to study techniques to evaluate machine learning models and quickly render predictions. This missing piece of machine learning infrastructure, the prediction serving system, is critical to delivering real-time and intelligent applications and services. As we studied the prediction-serving problem, ...

July 31, 2017 | Author: Daniel Crankshaw | Post Views: 317

Opaque: Secure Apache Spark SQL

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to any attacker who can observe memory contents. This is a challenging problem because security usually implies a tradeoff between performance and functionality. Cryptographic approaches like fully homomorphic encryption provide full functionality to a ...

July 28, 2017 | Author: Wenting Zheng | Post Views: 149

Announcing Ground v0.1

We’re excited to be releasing v0.1 of the Ground project! Ground is a data context service. It is a central repository for all the information surrounding the use of data in an organization. Ground concerns itself with what data an organization has, where that data is, who (both human beings and software systems) is touching that data, and how that data is being modified and described. Above all, Ground aims to be an open-source, vendor neutral system that provides users an unopinionated meta...

July 18, 2017 | Author: Vikram Sreekanti | Post Views: 111

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. Th...

July 10, 2017 | Author: Alexey Tumanov | Post Views: 121

RISELab Announces 3 Open Source Releases

Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray. Clipper is an open-source prediction-serving system. Clipper simplifies deploying models from a wide range of machine learning frameworks by exposing a common REST interface and automatically ensuring low-latency and high...

May 30, 2017 | Author: Joe Hellerstein | Post Views: 136

Making cities safer: data collection for Vision Zero

A critical part of enabling cities to implement their Vision Zero policies – the goal of the current National Transportation Data Challenge – is to be able to generate open, multi-modal travel experience data. While existing datasets use police and hospital reports to provide a comprehensive picture of fatalities and life altering injuries, by their nature, they are sparse and resist use for prediction and prioritization. Further, changes to infrastructure to support Vision Zero policies ...

April 26, 2017 | Author: K. Shankari | Post Views: 102

Declarative Heterogeneity Handling for Datacenter and ML Resources

Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, G...

March 24, 2017 | Author: Alexey Tumanov | Post Views: 95

RISELab at Spark Summit

This year, Spark Summit East was held in Boston between February 7-9. With over 1,500 attendees, this was the largest Spark Summit ever outside the Bay Area. Apache Spark, developed in large at AMPLab (the precursor of RISELab), is now the de-facto standard of big data processing. Like the previous Spark summits, UC Berkeley had a very strong presence. Ion Stoica gave a keynote on RISELab, describing the lab’s research focus on addressing a long-standing grand challenge in computing: enable...

March 17, 2017 | Author: Ion Stoica | Post Views: 105

Serverless Scientific Computing

For many scientific and engineering users, cloud infrastructure remains challenging to use. While many of their use cases are embarrassingly parallel, the challenges involved in provisioning and using stateful cloud services keep them trapped on their laptops or large shared workstations. Before getting started, a new cloud user confronts a bewildering number of choices. First, what instance type do they need ? How do they make the compute/memory tradeoff? How large do they want their cluster...

March 8, 2017 | Author: Eric Jonas | Post Views: 166

Metadata Megafail: Messing up Your Data Strategy in 3 Easy Steps

A key aspect of the RISELab agenda is to aggressively harness data—lots of it, both historical and live. Of course bits in computers don’t provide value on their own. We need a broader context for data: where it came from, what it represents, and how it gets used. Traditionally, people called this metadata: the data about our data. Requirements for metadata have changed drastically in recent years in response to technology trends. There’s an emerging groundswell to address these new req...

February 27, 2017 | Author: Joe Hellerstein | Post Views: 504

RISELab Kicks Off

Berkeley’s computer science division has an ongoing tradition of 5-year collaborative research labs. In the fall of 2016 we closed out the most recent of the series: the AMPLab. We think it was a pretty big deal, and many agreed. One great thing about Berkeley is the endless supply of energy and ideas that flows through the place — always bringing changes, building on what came before. In that spirit, we’re fired up to announce the Berkeley RISELab, where we will focus intensely for five ye...

January 21, 2017 | Author: melissa mecca | Post Views: 202