Application-level scheduling with custom resources New to Ray? Start Here! Ray intends to be a universal framework for a wide range of machine learning applications. This includes distributed training, machine learning inference, data processing, latency-sensitive applications, and throughput-oriented applications. Each of these applications has different, and, at times, conflicting requirements for resource management. Ray intends to cater to all of them, as the newly emerging microkernel for distributed machine learning. In order to achieve that kind of generality, Ray enables explicit developer control with respect to the task and actor placement by using custom resources. In this blog post we are going to talk about use cases and provide examples. This article is intended for readers already familiar with…
Publications
Serverless Computing: One Step Forward, Two Steps Back
The Case for GPU Multitenancy: The OoO VLIW JIT Compiler for GPU Inference
Dynamic Space-Time Scheduling for GPU Inference
Ray: A Distributed Framework for Emerging AI Applications
3Sigma: distribution-based cluster scheduling for runtime uncertainty
IDK Cascades: Fast Deep Learning by Learning not to Overthink
Tributary: spot-dancing for elastic services with latency SLOs
Real-Time Machine Learning: The Missing Pieces
Proteus: agile ML elasticity through tiered reliability in dynamic resource markets
Morpheus: Towards Automated SLOs for Enterprise Clusters
Blog Posts
Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat
On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging…
Declarative Heterogeneity Handling for Datacenter and ML Resources
Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, Google TPU, Intel Xeon PHI) and even FPGAs (e.g., Microsoft Catapult project). Nodes themselves are connected with heterogeneous interconnects, oftentimes with more than one interconnect option available (e.g., 40Gbps ethernet backbone, Infiniband, FPGA torus topology). The workloads we consolidate on top of this diverse hardware differ vastly in their success metrics (completion…