sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

Guanhua Wang Deep Learning, Distributed Systems, Open Source, Systems

Convolutional Neural Networks (ConvNets) enable computers to excel on vision learning tasks such as image classification, object detection. Recently, real-time inference on live data is becoming more and more important. From a system perspective, it requires fast inference on each single, incoming data item (e.g. 1 image). Two main-stream distributed model serving paradigms – data parallelism and model parallelism – are not necessarily desirable here, because we cannot further split a single input data piece via data parallelism, and model parallelism introduces huge communication overhead. To achieve live data inference with low latency, we propose sensAI, a novel and generic approach that decouples a CNN model into disconnected subnets, each is responsible for predicting certain class(es). We call this new more…

Authors: Guanhua Wang, Zhuang Liu, Brandon Hsieh, Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica

Robust Class Parallelism – Error Resilient Parallel Inference with Low Communication Cost

Guanhua Wang Deep Learning, Distributed Systems, Systems

Model parallelism is a standard paradigm to decouple a deep neural network (DNN) into sub-nets when the model is large. Recent advances in class parallelism significantly reduce the communication overhead of model parallelism to a single floating-point number per iteration. However, traditional fault-tolerance schemes, when applied to class parallelism, require storing the entire model on the hard disk. Thus, these schemes are not suitable for soft and frequent system noise such as stragglers(temporarily slow worker machines). In this paper, we propose an erasure-coding based redundant computing technique called robust class parallelism to improve the error resilience of model parallelism. We show that by introducing slight overhead in the computation at each machine, we can obtain robustness to soft system noise more…

Authors: Yaoqing Yang, Jichan Chung, Guanhua Wang, Vipul Gupta, Adarsh Karnati, Kenan Jiang, Ion Stoica, Joseph Gonzalez, Kannan Ramchandran

WAVE: A Decentralized Authorization Framework with Transitive Delegation

Hyung-Sin Kim Security, Systems

Most deployed authorization systems rely on a central trusted service whose compromise can lead to the breach of millions of user accounts and permissions. We present WAVE, an authorization framework offering decentralized trust: no central services can modify or see permissions and any participant can delegate a portion of their permissions autonomously. To achieve this goal, WAVE adopts an expressive authorization model, enforces it cryptographically, protects permissions via a novel encryption protocol while enabling discovery of permissions, and stores them in an untrusted scalable storage solution. WAVE provides competitive performance to traditional authorization systems relying on central trust. It is an open-source artifact and has been used for two years for controlling 800 IoT devices.

Authors: Michael Andersen, Sam Kumar, Moustafa AbdelBaky, Gabe Fierro, Jack Kolb, Hyung-Sin Kim, David Culler, Raluca Ada Popa

Modern Parallel and Distributed Python: A Quick Tutorial on Ray

Robert Nishihara blog, Distributed Systems, Open Source, Systems, Uncategorized 0 Comments

Ray is an open source project for parallel and distributed Python. This article was originally posted here. Parallel and distributed computing are a staple of modern applications. We need to leverage multiple cores or multiple machines to speed up applications or to run them at a large scale. The infrastructure for crawling the web and responding to search queries are not single-threaded programs running on someone’s laptop but rather collections of services that communicate and interact with one another. This post will describe how to use Ray to easily build applications that can scale from your laptop to a large cluster. Why Ray? Many tutorials explain how to use Python’s multiprocessing module. Unfortunately the multiprocessing module is severely limited in…

What Is the Role of Machine Learning in Databases?

Zongheng Yang blog, Database Systems, Deep Learning, Systems 0 Comments

(This article was authored by Sanjay Krishnan, Zongheng Yang, Joe Hellerstein, and Ion Stoica.) What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query. The au courant research direction, inspired by trends in Computer Vision, Natural Language Processing, and Robotics, is to apply deep learning; let the database learn the value of each execution strategy by executing different query plans repeatedly (an homage to Google’s robot “arm farm”) rather through a pre-programmed analytical…

A History of Postgres

Joe Hellerstein blog, Database Systems, Open Source, Projects, Systems, Uncategorized 0 Comments

(crossposted from The ACM began commissioning a series of reminiscence books on Turing Award winners. Thanks to hard work by editor Michael Brodie, the first one is Mike Stonebraker’s book, which just came out. I was asked to write the chapter on Postgres. I was one of the large and distinguished crew of grad students on the Postgres project, so this was fun. ACM in its wisdom decided that these books would be published in a relatively traditional fashion—i.e. you have to pay for them. The publisher, Morgan-Claypool, has this tip for students and ACM members: Please note that the Bitly link goes to a landing page where Students, ACM Members, and Institutions who have access to the ACM…

Confluo: Millisecond-level Queries on Large-scale Live Data

Anurag Khandelwal blog, Confluo, Open Source, Projects, Real-Time, Systems 1 Comment

Confluo is a system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad-hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. We are excited to release Confluo as an open-source C++ project, comprising: Confluo’s data structure library, that supports high throughput ingestion of logs, along with a wide range of online (live aggregates, conditional trigger executions, etc.) and offline (ad-hoc filters, aggregates, etc.) queries, and, A Confluo server implementation, that encapsulates the data structures and exposes its operations via an RPC interface, along with client libraries in C++, Java and Python.…

MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency

Hyung-Sin Kim Intelligent, Real-Time, Systems

This paper presents MARVEL, a mobile augmented reality (MAR) system which provides a notation display service with imperceptible latency (<100 ms) and low energy consumption on regular mobile devices. In contrast to conventional MAR systems, which recognize objects using image-based computations performed in the cloud, MARVEL mainly utilizes a mobile device’s local inertial sensors for recognizing and tracking multiple objects, while computing local optical flow and offloading images only when necessary. We propose a system architecture which uses local inertial tracking, local optical flow, and visual tracking in the cloud synergistically. On top of that, we investigate how to minimize the overhead for image computation and offloading. We have implemented and deployed a holistic prototype system in a commercial building more…

Authors: Kaifei Chen, Tong Li, Hyung-Sin Kim, David Culler, Randy Katz

System Architecture Directions for Post-SoC/32-bit Networked Sensors

Hyung-Sin Kim Networks, Systems

The emergence of low-power 32-bit Systems-on-Chip (SoCs), which integrate a 32-bit MCU, radio, and flash, presents an opportunity to re-examine design points and trade-offs at all levels of the system architecture of networked sensors. To this end, we develop a post-SoC/32-bit design point called Hamilton, showing that using integrated components enables a ∼$7 core and shifts hardware modularity to design time. We study the interaction between hardware and embedded OSes, identifying that (1) post-SoC motes provide lower idle current (5.9 µA) than traditional 16-bit motes, (2) 32-bit MCUs are a major energy consumer (e.g., tick increases idle current >50 times), comparable to radios, and (3) thread-based concurrency is viable, requiring only 8.3 µs of context switch time. We design a more…

Authors: Hyung-Sin Kim, Michael Andersen, Kaifei Chen, Sam Kumar, William J. Zhao, Kevin Ma, David Culler

SQL Query Optimization Meets Deep Reinforcement Learning

Zongheng Yang blog, Database Systems, Deep Learning, Reinforcement Learning, Systems 0 Comments

We show that deep reinforcement learning is successful at optimizing SQL joins, a problem studied for decades in the database community.  Further, on large joins, we show that this technique executes up to 10x faster than classical dynamic programs and 10,000x faster than exhaustive enumeration.  This blog post introduces the problem and summarizes our key technique; details can be found in our latest preprint, Learning to Optimize Join Queries With Deep Reinforcement Learning. SQL query optimization has been studied in the database community for almost 40 years, dating all the way back from System R’s classical dynamic programming approach.  Central to query optimization is the problem of join ordering.  Despite the problem’s rich history, there is still a continuous stream…

Going Fast and Cheap: How We Made Anna Autoscale

Vikram Sreekanti blog, Database Systems, Distributed Systems, Open Source, Systems, Uncategorized 0 Comments

Background: In an earlier blog post, we described a system called Anna, which used a shared-nothing, thread-per-core architecture to achieve lightning-fast speeds by avoiding all coordination mechanisms. Anna also used lattice composition to enable a rich variety of coordination-free consistency levels. The first version of Anna blew existing in-memory KVSes out of the water: Anna is up to 700x faster than Masstree, an earlier state-of-the-art research KVS, and up to 800x faster than Intel’s “lock-free” TBB hash table. You can find the previous blog post here and the full paper here. We refer to that version of Anna as “Anna v0.” In this post, we describe how we extended the fastest KVS in the cloud to be extremely cost-efficient and…

MLPerf: SPEC for ML

David Patterson Deep Learning, News, Open Source, Optimization, Reinforcement Learning, Systems, Uncategorized 0 Comments

The RISE Lab at UC Berkeley today joins Baidu, Google, Harvard University, and Stanford University to announce a new benchmark suite for machine learning called MLPerf at the O’Reilly AI conference in New York City (see The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance eventually for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users. Historical Inspiration. We are motivated in part by the System Performance Evaluation Cooperative (SPEC) benchmark for general-purpose computing that drove rapid,…

SafeBricks: Shielding Network Functions in the Cloud

Rishabh Poddar Networks, Security, Systems

With the advent of network function virtualization (NFV), outsourcing network processing to the cloud is growing in popularity amongst enterprises and organizations. Such outsourcing, however, poses a threat to the security of the client’s traffic because the cloud is notoriously susceptible to attacks. We present SafeBricks, a system that shields generic network functions (NFs) from an untrusted cloud. SafeBricks ensures that only encrypted traffic is exposed to the cloud provider, and preserves the integrity of both traffic and the NFs. At the same time, it enables clients to reduce their trust in NF implementations by enforcing least privilege across NFs deployed in a chain. SafeBricks does not require changes to TLS, and safeguards the interests of NF vendors as well more…

Authors: Rishabh Poddar, Chang Lan, Raluca Ada Popa, Sylvia Ratnasamy

Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS 🗺

Joe Hellerstein blog, Database Systems, Distributed Systems, Real-Time, Systems, Uncategorized 0 Comments

This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural question: what would it take to build a key-value store that would excel across many orders of magnitude of scale, from a single multicore box to the global cloud? Turns out this kind of curiosity can lead to a system with pretty interesting practical…

Design and Analysis of a Query Processor for Brick

Gabe Fierro Projects, Systems

Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet this performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting more…

Authors: Gabe Fierro, 101

Opaque: Secure Apache Spark SQL

Wenting Zheng blog, Security, Systems

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to any attacker who can observe memory contents. This is a challenging problem because security usually implies a tradeoff between performance and functionality. Cryptographic approaches like fully homomorphic encryption provide full functionality to a system, but are extremely slow. Systems like CryptDB utilize lighter cryptographic primitives to provide a practical database, but are limited in functionality. Recent developments in trusted hardware enclaves (such as Intel SGX) provide a much needed alternative. These hardware enclaves provide hardware-enforced shielded execution that allows…

Announcing Ground v0.1

Vikram Sreekanti blog, Ground, News, Open Source, Projects, Systems

We’re excited to be releasing v0.1 of the Ground project! Ground is a data context service. It is a central repository for all the information surrounding the use of data in an organization. Ground concerns itself with what data an organization has, where that data is, who (both human beings and software systems) is touching that data, and how that data is being modified and described. Above all, Ground aims to be an open-source, vendor neutral system that provides users an unopinionated metamodel and set of APIs that allow them to think about and interact with data context generated in their organization. Ground has many use cases, but we’re focused on two specific ones at present: Data Inventory: large organizations…

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

Alexey Tumanov blog, Deep Learning, Reinforcement Learning, Systems

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging…

RISELab Announces 3 Open Source Releases

Joe Hellerstein blog, Clipper, Ground, Open Source, Projects, Ray, Systems

Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray. Clipper is an open-source prediction-serving system. Clipper simplifies deploying models from a wide range of machine learning frameworks by exposing a common REST interface and automatically ensuring low-latency and high-throughput predictions.  In the 0.1 release, we focused on reliable support for serving models trained in Spark and Scikit-Learn.  In the next release we will be introducing support for TensorFlow and Caffe2 as well as online-personalization and multi-armed bandits.  We are providing active support for early users and will be following Github issues…

Declarative Heterogeneity Handling for Datacenter and ML Resources

Alexey Tumanov blog, Systems 0 Comments

Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, Google TPU, Intel Xeon PHI) and even FPGAs (e.g., Microsoft Catapult project). Nodes themselves are connected with heterogeneous interconnects, oftentimes with more than one interconnect option available (e.g., 40Gbps ethernet backbone, Infiniband, FPGA torus topology). The workloads we consolidate on top of this diverse hardware differ vastly in their success metrics (completion…

Grail Quest: A New Proposal for Hardware-assisted Garbage Collection

Martin Maas Systems

Many big data systems are written in garbage-collected languages and GC has a substantial impact on throughput, responsiveness and predicability of these systems. However, despite decades of research, there is still no “Holy Grail” of GC: a collector with no measurable impact, even on real-time applications. Such a collector needs to achieve freedom from pauses, high GC throughput and good memory utilization, without slowing down application threads or using substantial amounts of compute resources. In this paper, we propose a step towards this elusive goal by reviving the old idea of moving GC into hardware. We discuss the trends that make it the perfect time to revisit this approach and present the design of a hardware-assisted GC that aims to more…

Authors: Martin Maas, Krste Asanovic, John Kubiatowicz

Serverless Scientific Computing

Eric Jonas blog, Projects, Systems 0 Comments

For many scientific and engineering users, cloud infrastructure remains challenging to use. While many of their use cases are embarrassingly parallel, the challenges involved in provisioning and using stateful cloud services keep them trapped on their laptops or large shared workstations. Before getting started, a new cloud user confronts a bewildering number of choices. First, what instance type do they need ? How do they make the compute/memory tradeoff? How large do they want their cluster to be? Can they take advantage of dynamic market-based instances (spot instances) that can disappear at any time? What if they have 1000 small jobs, each of which takes a few minutes — what’s the most cost-effective way of allocating servers? What host operating…

SparkR: Scaling R Programs with Spark

Ali Ghodsi Systems

R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the R runtime is single threaded and can only process data sets that fit in a single machine’s memory. We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark’s distributed computation engine to enable large scale data analysis from the R shell. We describe the main design goals of SparkR, discuss how the high-level DataFrame API enables scalable computation and present some of the key details of our implementation.

Authors: Shivaram Venkataraman, Ali Ghodsi, Ion Stoica

MiniCrypt: Reconciling Encryption and Compression for Big Data Stores.

Raluca Ada Popa Security, Systems

More and more applications and web services generate larger and larger amounts of confidential data, such as user and financial data. On one hand, these systems must use encryption to ensure confidentiality, while on the other hand, they want to use compression to reduce costs and increase performance. Unfortunately, encryption and compression are in tension, leading many existing systems to support one but not the other. We propose MiniCrypt,  the first big data keyvalue store that reconciles encryption and compression, without compromising performance.  At the core of MiniCrypt is an observation on data compressibility trends in key-value stores, which enables grouping key-value pairs in small key packs, together with a set of new distributed systems techniques for retrieving, updating,  merging more…

Authors: Wenting Zheng, Raluca Ada Popa, Ion Stoica, Rachit Agarwal, Frank Li

Opaque: An Oblivious and Encrypted Distributed Analytics Platform.

Raluca Ada Popa Crypto, Security, Systems

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to an attacker who has compromised the operating system or hypervisor. Trusted hardware such as Intel SGX has recently become available in latest-generation processors. Such hardware enables arbitrary computation on encrypted data while shielding it from a malicious OS or hypervisor. However, it still suffers from a significant side channel: access pattern leakage. We present Opaque, a package for Apache Spark SQL that enables very strong security for SQL queries: data encryption, computation verification, and access pattern leakage protection (a.k.a. more…

Authors: Wenting Zheng, Raluca Ada Popa, Ion Stoica, Joseph Gonzalez, Ankur Dave, Jethro Beekman

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

Alexey Tumanov Systems

Many shared computing clusters allow users to utilize excess idle resources at lower cost or priority, with the proviso that some or all may be taken away at any time. But, exploiting such dynamic resource availability and the often fluctuating markets for them requires agile elasticity and effective acquisition strategies. Proteus aggressively exploits such transient revocable resources to do machine learning (ML) cheaper and/or faster. Its parameter server framework, AgileML, efficiently adapts to bulk additions and revocations of transient machines, through a novel 3-stage active-backup approach, with minimal use of more costly non-transient resources. Its BidBrain component adaptively allocates resources from multiple EC2 spot markets to minimize average cost per work as transient resource availability and cost change over time. more…

Authors: Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, Phil Gibbons

Morpheus: Towards Automated SLOs for Enterprise Clusters

Alexey Tumanov Systems

Modern resource management frameworks for largescale analytics leave unresolved the problematic tension between high cluster utilization and job’s performance predictability—respectively coveted by operators and users. We address this in Morpheus, a new system that: 1) codifies implicit user expectations as explicit Service Level Objectives (SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g., due to failures) by means of dynamic reprovisioning of jobs. We validate these ideas against production traces from a 50k node cluster, and show that Morpheus can lower the number of deadline violations by 5x to 13x, while retaining cluster-utilization, and lowering cluster footprint by 14% to 28%. We demonstrate more…

Authors: C. Curino, I. Menache, S. Narayanamurthy, Alexey Tumanov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan, J. Kulkarni, S. Rao

Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw Intelligent, Real-Time, Systems

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, more…

Authors: Dan Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph Gonzalez, Ion Stoica