Portfolio Archive

NumS

NumS is a Numerical cloud computing library that translates Python and NumPy to distributed systems code at runtime. NumS scales NumPy operations horizontally, and provides inter-operation (task) parallelism for those operations. NumS remains faithful to the NumPy API, and provides tight integration with the Python programming language by supporting loop parallelism and branching. NumS’ system-level operations are written against the Ray API; it supports S3 and basic distributed filesystem operations for storage and uses NumPy as a backend for CPU-based array operations. Please visit the open source project on github for more details.

Authors: Array, Array

TCPlp

Low-power and lossy networks (LLNs) enable diverse applications integrating many resource-constrained embedded devices, often requiring interconnectivity with existing TCP/IP networks as part of the Internet of Things. But TCP has received little attention in LLNs due to concerns about its overhead and performance, leading to LLN-specific protocols that require specialized gateways for interoperability. We present a systematic study of a well-designed TCP stack in IEEE 802.15.4-based LLNs, based on the TCP protocol logic in FreeBSD. Through careful implementation and extensive experiments, we show that modern low-power sensor platforms are capable of running full-scale TCP and that TCP, counter to common belief, performs well despite the lossy nature of LLNs. By carefully studying the interaction between the transport and link layers,…

Authors: Array, Array, Array, Array

Cloudburst

Cloudburst is a stateful Functions-as-a-Service platform built on top of Anna. Cloudburst has all the benefits of a standard serverless FaaS system like AWS Lambda: no operational overheads, seamless autoscaling, and disaggregated infrastructure. However, inspired by our recent work on the shortcoming of existing serverless systems, Cloudburst focuses on enabling new applications to take advantage of serverless infrastructure by focusing on enabling smart state management. We accomplish this with a new design principle: logical disaggregation with physical colocation. While compute and storage are logically separate services, Cloudburst introduces state caches that live on the same physical machines as compute workers to reduce data transfer costs. Combined with smart scheduling techniques and state-of-the-art consistency protocols, this principle allows Cloudburst to scale…

ERDOS and Pylot

ERDOS is a platform for developing self-driving cars and robotics applications. The system is built using techniques from streaming dataflow systems which is reflected by the API. Applications are modeled as directed graphs, in which data flows through streams and is processed by operators. Pylot is a modular autonomous vehicle (AV) platform for developing and testing autonomous vehicle components (e.g., perception, prediction, planning) on the CARLA simulator and real-world cars. Pylot is built on top of ERDOS and enables novel AV research by providing a suite of reference components to enable evaluation with realistic end-to-end AV pipelines. Using ERDOS and Pylot, we are investigating several AV-related research problems such as navigating latency-accuracy trade-offs, deterministic execution, and the management of time…

AdaHessian and PyHessian

AdaHessian and PyHessian are libraries for second order based optimization and analysis of the neural network training based on PyTorch. The library supports the training of convolutional neural networks (image_classification) and transformer-based models (transformer). See code at: https://github.com/amirgholami/adahessian https://github.com/amirgholami/PyHessian

Authors: Array, Array, Array

PyHessian

PyHessian is a PyTorch library for Hessian based analysis of neural network models. The library enables computing the following metrics: Top Hessian eigenvalues The trace of the Hessian matrix The full Hessian Eigenvalues Spectral Density (ESD) Code is available at https://github.com/amirgholami/PyHessian

MC²

Collaborative analytics and machine learning have the potential to extract a tremendous amount of value from joint datasets, but data owners are often unwilling to share data due to regulatory or privacy concerns. The MC2 (Multiparty Collaboration and Competition) platform is an assortment of sub-components that enables several data owners to perform analytics and/or jointly train ML models on their collective data without revealing their individual data to each other. Our platform is open source and available here.

Authors: Array, Array, Array, Array

NBSafety

NBSafety is a drop-in replacement for Jupyter’s Python 3 kernel that automatically highlights potential bugs due to out-of-order cell executions. https://nbsafety.org

Wavelet

System Design for Speeding-up Distributed ML

Authors: Array, Array

LUX

Lux is an open-source, Python library for accelerating and simplifying the process of data exploration. Lux recommends interesting visualizations to guide users towards potential next-steps in their analysis. Visualizations are displayed as a widget in-situ a Jupyter notebook, providing a seamless transition between code and interaction. Lux automates away the tedious efforts typically present in visual data analysis, allowing the analyst to explore their data at the speed of thought. For more details, see https://github.com/lux-org/lux.

Authors: Array, Array

SQProp

SQProp is an algorithm to compute forward and backward propagation with mixed-precision numbers. Reducing numerical precision is critical to achieve fast and economic development and deployment of deep neural networks. While most existing reduced-precision algorithms focus only on the forward propagation, SQProp provides an unified framework for both forward and back propagation, making it suitable for accelerating both the training and inference of deep neural networks. SQProp is based on Stochastic Quantization with advanced variance reduction technique. Instead of the worth-case analysis of error accumulation, SQProp comes with a statistical framework that analyzes the bias and variance of the gradient. This enables the development of unbiased, variance-reduced gradient estimators. The theory and implementation of SQProp are still under active development.…

Authors: Array, Array, Array, Array

AutoPandas

AutoPandas is an input-output example based program synthesis engine for the Pandas Python library.

Metal

File sharing systems like Dropbox offer insufficient privacy since a compromised server can see the file content in the clear. Though encryption can hide such content from the servers, metadata leakage remains significant. It is promising to develop a file sharing system that hides such metadata—including user identities and file access patterns. Metal is the first file sharing system that hides such metadata from malicious users and that has a latency of only a few seconds. The core of Metal is a new two-server multi-user oblivious RAM (ORAM) scheme, which is secure against malicious users, together with metadata-hiding access control and capability sharing.

Authors: Array, Array

Cerebro

In the collaborative machine learning setting, multiple organizations cooperate to train or predict over their joint datasets. Unfortunately, collaborative learning cannot happen over sensitive data because such data cannot be shared in plaintext due to privacy constraints, such as policy regulations and business competition. We present Cerebro, a platform that leverages cryptography to enable multiple parties to compute learning tasks without revealing any party’s input data to another party. Cerebro provides a cryptographic compiler that is able to automatically compile and optimize a program written in a high-level language into a secure protocol. Moreover, by taking an end-to-end approach to the system design, Cerebro allows multiple parties with complex economic relationships to safely collaborate on machine learning computation.

Authors: Array, Array, Array, Array

Anna

Anna is a low-latency, autoscaling key-value store. The core design goal for Anna is to avoid expensive locking and lock-free atomic instructions, which have recently been shown to be extremely inefficient. Anna instead employs a wait-free, shared-nothing architecture, where each thread in the system is given a private memory buffer and is allowed to process requests unencumbered by coordination. To resolve potentially conflicting updates, Anna encapsulates all user data in lattice data structures, which have associative, commutative, and idempotent merge functions. As a result, for workloads that can tolerate slightly stale data, Anna provides best-in-class performance. A more detailed description of the system design and the coordination-free consistency mechanisms, as well as an evaluation and comparison against other state-of-the-art systems…

Keystone Enclave

Keystone is an open framework for architecting trusted execution environments (TEEs).

Authors: Array, Array, Array

sensAI

Distribute ML with ZERO communication

Authors: Array, Array, Array

Modin

Modin is an early stage DataFrame library that wraps pandas and transparently distributes the data and computation, accelerating your pandas workflows with one line of code change. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas, since the API is identical to pandas.

Tune

Ray Tune is a scalable hyperparameter optimization framework for reinforcement learning and deep learning. Go from running one experiment on a single machine to running on a large cluster with efficient search algorithms without changing your code. Check out the project at http://ray.readthedocs.io/en/latest/tune.html

Authors: Array, Array, Array, Array, Array, Array

RLlib

Alchemist

Alchemist is an interface between Apache Spark applications and MPI-based libraries for accelerating large-scale linear algebra and machine learning computations. Performing communication-intense linear algebra computations in Spark can incur large overheads. Alchemist bypasses these overheads by sending the data from the Spark application to existing or custom MPI-based libraries, and then transmitting the results back to the application. This leads to significantly fewer overheads and to computations that are efficient and scalable. For the current version of the code, see https://github.com/alexgittens/alchemist A new version will be released soon!

ASAP: Fast, Approximate Graph Pattern Mining at Scale

A fast, approximate, distributed graph pattern mining system.

Cirrus

Cirrus is a specialized framework for running iterative large-scale machine learning algorithms on serverless infrastructure. It is lightweight and highly optimized to run on small stateless lambdas to achieve low cost, high scalability and fault-tolerance.

Authors: Array, Array

flor

Build, run, and reproduce experiments with Jarvis.

Authors: Array, Array, Array, Array, Array, Array

XBOS

XBOS (eXtensible Building Operating System) is an open-source large-scale distributed operating system for smart buildings

Authors: Array, , Array, Array,

Blink

Blink is a fast and generic collective communication library for distributed machine learning

Authors: Array, Array, Array, Array, Array, Array

Pywren

Pushing microservices to teraflops

Authors: Array, Array

WAVE

WAVE provides global-scale authorization for IoT without an authority

Author:

SafeBricks

A system for outsourcing general-purpose middleboxes to the cloud while providing strong security guarantees.

Authors: Array, Array

Chorus

Chorus is an analysis and rewriting tool for relational queries.

Author: Array

Database Acceleration

Specializing architectures to support database operations. Krste Asanovickrste@eecs.berkeley.edu

Firebox

Designing the next-generation warehouse-scale computer

Authors: Array, Array, Array, Array

FireSim

FireSim (https://fires.im) is an open-source cycle-accurate, FPGA-accelerated scale-out computer system simulation platform. FireSim is capable of simulating from one to thousands of multi-core compute nodes, derived directly from silicon-proven and open target-RTL, with an optional cycle-accurate network simulation tying them together.

Authors: Array, Array

Quilt

Recent industry trends indicate a shift toward program- matic management of distributed infrastructure. While the benefits of infrastructure APIs are widely understood, de- cidedly less attention has been paid to the design of such APIs. The de facto standard approach – a RESTful inter- face paired with a YAML representation – leads to unnec- essary complexity for both container orchestrator imple- mentations and distributed application developers. We argue that a better API for programmatic infrastruc- ture is a general-purpose programming language. Such a language allows specification of distributed applications with strong primitives for abstraction, composition, and sharing, all while allowing deployment engines to remain ignorant of high level constructs. We present Quilt, an open source project that demon- strates these…

SCSG

We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size n; indeed, these costs are independent of n when the target accuracy is low. An experimental evaluation of SCSG on the MNIST dataset shows that it can yield accurate results on this dataset on a single commodity machine with a memory footprint of only 2.6MB and only eight disk accesses. Mike Jordanjordan@cs.berkeley.edu

Arx

A practical and functionally rich DBMS that encrypts data with only strong, semantically secure encryption schemes.

Authors: Array, Array

Confluo

Confluo is a system for real-time monitoring and analysis of data streams.

Authors: Array, Array

Succinct

Succinct is a data store that enables efficient queries directly on a compressed representation of the input data. Succinct uses a compression technique that allows random access into the input, thus enabling efficient storage and retrieval of data. In addition, Succinct natively supports a wide range of queries including count and search of arbitrary strings, range and wildcard queries. What differentiates Succinct from previous techniques is that Succinct supports these queries without storing indexes — all the required information is embedded within the compressed representation. Evaluation on real-world datasets show that Succinct requires an order of magnitude lower memory than systems with similar functionality. Succinct thus pushes more data in memory, and provides low query latency for a larger range…

Ernest + Hemingway

Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of optimization algorithms vary with the size of the cluster. We make the case for an ML-optimizer that can select the appropriate algorithm and cluster size to use for a given problem. To do this we propose building two models: one that captures the system level characteristics of how computation, communication change as we increase cluster sizes and another that captures how convergence rates change with cluster sizes. We present preliminary results from our prototype implementation called Hemingway and discuss some of the challenges involved in developing such a…

Drizzle

Drizzle is a hybrid streaming system that unifies the record-at-a-time streaming and micro-batch models.

Authors: Array, Array

Casual Inference

Inferring causality between randomly observed stochastic processes is inherentily a difficult task and few statistical guarantees are available for practitioners in this setting. As such data sets are now very common (stock market, mobile sensing, medical data), we want to provide a statistically sound and scalable approach to infer relationships between continuous stochastic processes observed discretely and randomly. We show how spectral domain random projections can mitigate statistical and scalability related issues provided a careful design of the frequency domain basis is used. After having provided theorems giving strong statistical guarantees, we show through numerical experiments on surrogate and actual data that our method is reliable and scalable. Francois Bellettifrancois.belletti@berkeley.edu Joey Gonzalezjegonzal@cs.berkeley.edu

Meta-RL

Rather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a "meta-policy" from previously learned policies. We explore how Deep Neural Networks can represent meta-policies that switch among a set of previously learned policies, specifically in settings where the dynamics of a new scenario are composed of a mixture of previously learned dynamics and where the state observation is possibly corrupted by sensing noise. Richard Liawrliaw@berkeley.edu Joey Gonzalezjegonzal@cs.berkeley.edu

Tegra: Efficient Ad-Hoc Analytics on Time-Evolving Graphs

A time-evolving graph processing system built on a general-purpose dataflow framework.

Authors: Array, Array, Array

Ray

Ray is a high-performance distributed execution framework targeted at large-scale machine learning and reinforcement learning applications.

IndexedRDD

Immutability dramatically simplifies fault tolerance, straggler mitigation, and data consistency and is an essential part of widely-used distributed batch analytics systems including MapReduce, Dryad, and Spark. However, these systems are increasingly being used for new applications like stream processing and incremental analytics, which often demand fine-grained updates and are seemingly at odds with the essential assumption of immutability. We introduce the persistent adaptive radix tree (PART), a map data structure that supports efficient fine-grained updates without compromising immutability. In addition, PART (1) allows applications to trade off latency for throughput using batching, (2) supports efficient scans using an optimized memory layout and periodic compaction, and (3) achieves efficient fault recovery using incremental checkpoints. PART achieves update performance comparable to a…

Opaque

A distributed data analytics platform supporting a wide range of queries while providing strong security.

Temgine

Time series analysis presents unique challenges in the field of machine learning. Dealing with randomly observed time series is still difficult with standard methods as timestamps are irregularly spaced. Long Range Dependent time series are notoriously hard to analyze as they can lead to the discovery of correlation where there is none. The scale of data sets now offers unique opportunities in terms of training deeper, more complex models, but requires a lot of modifications to standard approaches which were designed for a single machine to run fast. Temgine aims at offering a query optimization based solution that leverages the duality between time and frequency domain in time series in order to provide estimators for randomly observed data, efficiently erase…

DeepCode

Deep learning models have been sucessfully applied to various areas including computer vision, natural language processing, etc. Little work has been done in the context of program synthesis, code completion in particular. In this work, we provide benchmark results for code completion on the Abstract Syntax Tree(AST) of JavaScripts code collected from Github. We have show the basic LSTM variants could achive 79% top 1 accuracy and 85.6% top 5 accuracy which is comparable to the probablistic model using much domain knowledge. We further evaluate the speed at serving phase, each query takes 33ms running on 16 core CPU and 16ms with one K80 GPU. Xin Wangxinw@berkeley.edu Joey Gonzalezjegonzal@cs.berkeley.edu