Alexey Tumanov

I am a PostDoc at the University of California Berkeley, working with Ion Stoica. I completed my PhD at Carnegie Mellon, advised by Greg Ganger and collaborating closely with Mor Harchol-Balter and Onur Mutlu. My work at CMU was partially funded by the NSERC CGS-D3 Fellowship as well as the Intel Science and Technology Centre for Cloud Computing and Parallel Data Lab. Prior to Carnegie Mellon, I worked on agile stateful VM replication with para-virtualization at the University of Toronto. My interest in cloud computing brought me to UofT from industry, where I had worked on the development of cluster middleware responsible for distributed datacenter resource management. My most recent research focused on modeling, design, and development of abstractions, primitives, algorithms and systems artifacts for a general resource management framework with support for static and dynamic heterogeneity, hard and soft placement constraints, time-varying resource capacity guarantees, and combinatorial constraints in heterogeneous datacenters in the context of defining the next generation datacenter operating system stack.
http://www.cs.berkeley.edu/~atumanov

Publications

Real-Time Machine Learning: The Missing Pieces

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

Morpheus: Towards Automated SLOs for Enterprise Clusters

Blog Posts

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

Alexey Tumanov blog, Deep Learning, Reinforcement Learning, Systems

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging …

Declarative Heterogeneity Handling for Datacenter and ML Resources

Alexey Tumanov blog, Systems

Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, Google TPU, Intel Xeon PHI) and even FPGAs (e.g., Microsoft Catapult project). Nodes themselves are connected with heterogeneous interconnects, oftentimes with more than one interconnect option available (e.g., 40Gbps ethernet backbone, Infiniband, FPGA torus topology). The workloads we consolidate on top of this diverse hardware differ vastly in their success metrics (completion …