Blog

A Short History of Prediction-Serving Systems

Machine learning is an enabling technology that transforms data into solutions by extracting patterns that generalize to new data. Much of machine learning can be reduced to learning a model — a function that maps an input (e.g. a photo) to a prediction (e.g. objects in the photo). Once trained, these models can be used to make predictions on new inputs (e.g., new photos) and as part of more complex decisions (e.g., whether to promote a photo). While there are thousands of papers published ea...

June 15, 2018 | Author: Daniel Crankshaw | Post Views: 324

The Right to not be Tracked II: in which I turn off the location permission for Google, but it tracks me anyway

I recently published a post about the blurry boundaries between standard system services and Google Maps on Android. I argued that these boundaries made it hard to talk about consent and competition around location services. However, the branching factor for the data sharing made the argument complex and hard to follow. Even as I was writing that post, in the train on the way into Berkeley, I started getting notifications from the Google app about the weather at my location. The Google app (a...

June 3, 2018 | Author: K. Shankari | Post Views: 461

The Right to not be Tracked: a Spotlight on Google Maps and Android Location Tracking

There has been a lot of interest in data collected about users by Facebook recently. Journalists have been shocked when they downloaded the data that Facebook has on them. Most of this concern has been focused around data collected through explicit user interaction such as web browsing, or clicking on “Like” and “Share” buttons. Background data collection, which occurs without any explicit user intervention, is arguably creepier, because it collects data whether or not you interact wi...

May 20, 2018 | Author: K. Shankari | Post Views: 777

Michael I. Jordan: Artificial Intelligence — The Revolution Hasn’t Happened Yet

(This article has originally been published on Medium.com.) Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and venture capitalists alike. As with many phrases that cross over from technical academic fields into general circulation, there is significant misunderstanding accompanying the use of the phrase. But this is not the classical case of the public not understanding the scientists — here the scientists...

May 4, 2018 | Author: Boban Zarkovich | Post Views: 1146

Open source platform + undergraduate energy = sustainability research

This Earth Day, join a study on motivating sustainable transportation behavior. I have blogged about the e-mission project earlier in the context of the National Transportation Data Challenge. (https://rise.cs.berkeley.edu/blog/making-cities-safer-data-collection-vision-zero/). To recap, e-mission focuses on building an extensible platform that can instrument the end-to-end multi-modal travel experience at the personal scale and collate it for analysis at the societal scale. In particular, it...

April 24, 2018 | Author: K. Shankari | Post Views: 500

Online Foundations of Data Science Course Launches on edX!

UC Berkeley’s pathbreaking entry-level course on the Foundations of Data Science (Data 8) is launching on edX on April 2. This makes the fastest-growing class in UC Berkeley history available to everyone. Foundations of Data Science teaches computational and inferential thinking from the ground up. It covers everything from testing hypotheses, applying statistical inferences, visualizing distributions and drawing conclusions—all while coding in Python and using real world data sets. The cou...

March 28, 2018 | Author: Boban Zarkovich | Post Views: 1834

Distributed Policy Optimizers for Scalable and Reproducible Deep RL

In this blog post we introduce Ray RLlib, an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed policy optimizers that make it easy to use a variety of training strategies with existing reinforcement learning algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms...

March 20, 2018 | Author: Eric Liang | Post Views: 1342

Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS 🗺

This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural quest...

March 12, 2018 | Author: Joe Hellerstein | Post Views: 6023

RISECamp Behind the Scenes

  RISECamp was held at UC Berkeley on September 7th and 8th. This post looks behind the scenes at the technical infrastructure used to provide a cloud-hosted cluster for each attendee with ready-to-use Jupyter notebooks requiring only a web browser to access. Background and Requirements RISECamp is the latest in a series of workshops held by RISELab (and its predecessor, AMPLab) showcasing the latest research from the lab. The sessions consist of talks on the latest research systems produce...

November 13, 2017 | Author: Jey Kottalam | Post Views: 226

Fast Python Serialization with Ray and Apache Arrow

This post was originally posted here. Robert Nishihara and Philipp Moritz are graduate students in the RISElab at UC Berkeley. This post elaborates on the integration between Ray and Apache Arrow. The main problem this addresses is data serialization. From Wikipedia, serialization is … the process of translating data structures or object state into a format that can be stored … or transmitted … and reconstructed later (possibly in a different computer environment). Why is any translation n...

October 16, 2017 | Author: Robert Nishihara | Post Views: 439

Ray: 0.2 Release

This was originally posted on the Ray blog. We are pleased to announce the Ray 0.2 release. This release includes the following: substantial performance improvements to the Plasma object store an initial Jupyter notebook based web UI the start of a scalable reinforcement learning library fault tolerance for actors Plasma Since the last release, the Plasma object store has moved out of the Ray codebase and is now being developed as part of Apache Arrow (see the relevant documentation), so th...

October 10, 2017 | Author: Robert Nishihara | Post Views: 244

Low-Latency Model Serving with Clipper

The mission of the RISELab is to develop technologies that enable applications to make low-latency decisions on live data with strong security. One of the first steps towards achieving this goal is to study techniques to evaluate machine learning models and quickly render predictions. This missing piece of machine learning infrastructure, the prediction serving system, is critical to delivering real-time and intelligent applications and services. As we studied the prediction-serving problem, ...

July 31, 2017 | Author: Daniel Crankshaw | Post Views: 861

Opaque: Secure Apache Spark SQL

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to any attacker who can observe memory contents. This is a challenging problem because security usually implies a tradeoff between performance and functionality. Cryptographic approaches like fully homomorphic encryption provide full functionality to a ...

July 28, 2017 | Author: Wenting Zheng | Post Views: 365

Announcing Ground v0.1

We’re excited to be releasing v0.1 of the Ground project! Ground is a data context service. It is a central repository for all the information surrounding the use of data in an organization. Ground concerns itself with what data an organization has, where that data is, who (both human beings and software systems) is touching that data, and how that data is being modified and described. Above all, Ground aims to be an open-source, vendor neutral system that provides users an unopinionated meta...

July 18, 2017 | Author: Vikram Sreekanti | Post Views: 253

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. Th...

July 10, 2017 | Author: Alexey Tumanov | Post Views: 280

RISELab Announces 3 Open Source Releases

Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray. Clipper is an open-source prediction-serving system. Clipper simplifies deploying models from a wide range of machine learning frameworks by exposing a common REST interface and automatically ensuring low-latency and high...

May 30, 2017 | Author: Joe Hellerstein | Post Views: 322

Making cities safer: data collection for Vision Zero

A critical part of enabling cities to implement their Vision Zero policies – the goal of the current National Transportation Data Challenge – is to be able to generate open, multi-modal travel experience data. While existing datasets use police and hospital reports to provide a comprehensive picture of fatalities and life altering injuries, by their nature, they are sparse and resist use for prediction and prioritization. Further, changes to infrastructure to support Vision Zero policies ...

April 26, 2017 | Author: K. Shankari | Post Views: 305

Declarative Heterogeneity Handling for Datacenter and ML Resources

Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, G...

March 24, 2017 | Author: Alexey Tumanov | Post Views: 231

RISELab at Spark Summit

This year, Spark Summit East was held in Boston between February 7-9. With over 1,500 attendees, this was the largest Spark Summit ever outside the Bay Area. Apache Spark, developed in large at AMPLab (the precursor of RISELab), is now the de-facto standard of big data processing. Like the previous Spark summits, UC Berkeley had a very strong presence. Ion Stoica gave a keynote on RISELab, describing the lab’s research focus on addressing a long-standing grand challenge in computing: enable...

March 17, 2017 | Author: Ion Stoica | Post Views: 249

Serverless Scientific Computing

For many scientific and engineering users, cloud infrastructure remains challenging to use. While many of their use cases are embarrassingly parallel, the challenges involved in provisioning and using stateful cloud services keep them trapped on their laptops or large shared workstations. Before getting started, a new cloud user confronts a bewildering number of choices. First, what instance type do they need ? How do they make the compute/memory tradeoff? How large do they want their cluster...

March 8, 2017 | Author: Eric Jonas | Post Views: 389

Metadata Megafail: Messing up Your Data Strategy in 3 Easy Steps

A key aspect of the RISELab agenda is to aggressively harness data—lots of it, both historical and live. Of course bits in computers don’t provide value on their own. We need a broader context for data: where it came from, what it represents, and how it gets used. Traditionally, people called this metadata: the data about our data. Requirements for metadata have changed drastically in recent years in response to technology trends. There’s an emerging groundswell to address these new req...

February 27, 2017 | Author: Joe Hellerstein | Post Views: 1027

RISELab Kicks Off

Berkeley’s computer science division has an ongoing tradition of 5-year collaborative research labs. In the fall of 2016 we closed out the most recent of the series: the AMPLab. We think it was a pretty big deal, and many agreed. One great thing about Berkeley is the endless supply of energy and ideas that flows through the place — always bringing changes, building on what came before. In that spirit, we’re fired up to announce the Berkeley RISELab, where we will focus intensely for five ye...

January 21, 2017 | Author: melissa mecca | Post Views: 596