by Stephen Offer and Ellick Chan As a consequence of the growing computational demands of machine learning algorithms, the need for powerful computer clusters is increasing. However, existing infrastructure for implementing parallel machine learning algorithms is still primitive. While good solutions for specific use cases (e.g., parameter servers or hyperparameter search) and parallel data processing do exist (e.g., Hadoop or Spark), to parallelize machine learning algorithms, practitioners often end up building their own customized systems, leading to duplicated efforts. To help address this issue, the RISELab has created Ray, a high-performance distributed execution framework. Ray supports general purpose parallel and distributed Python applications and enables large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the…
Blog Posts
Programming in Ray: Tips for first-time users
Ray is a general-purpose framework for programming a cluster. Ray enables developers to easily parallelize their Python applications or build new ones, and run them at any scale, from a laptop to a large cluster. Ray provides a highly flexible, yet minimalist and easy to use API. Table 1 shows the core of this API. In this blog, we describe several tips that can help first-time Ray users to avoid some common mistakes that can significantly hurt the performance of their programs. API Description Example ray.init() Initialize Ray context. @ray.remote Function or class decorator specifying that the function will be executed as a task or the class as an actor in a different process. @ray.remote @ray.remote def…
Cloud Programming Simplified: A Berkeley View on Serverless Computing
David Patterson and Ion Stoica The publication of “Above the Clouds: A Berkeley View of Cloud Computing” on February 10, 2009 cleared up the considerable confusion about the new notion of “Cloud Computing.” The paper defined what Cloud Computing was, where it came from, why some were excited by it, what were its technical advantages, and what were the obstacles and research opportunities for it to become even more popular. More than 17,000 citations to this paper and an abridged version in CACM—with more than 1000 in the past year—document that it continues to shape the discussions and the evolution of Cloud Computing. “Cloud Programming Simplified: A Berkeley View on Serverless Computing” with some of the same authors commemorates the…
ActiveClean featured in “the morning paper”
ActiveClean has been featured in today’s “the morning paper“. The ActiveClean project aims to develop tools and algorithms to address one of the key steps in model training pipelines: handle dirty or inconsistent data including extracting structure, imputing missing values, and handling incorrect data.
PyWren wins Best Vision Paper Award at SOCC’17
PyWren won the Best Vision paper Award at SOCC’17. PyWren is a new paralle computation engine that drmatically lowers the barrier for scientists to use public cloud for massively parallel worklods, by obviating the need for complex cluster management. This is a joint work between RISELab and Berkeley Center for Computational Imaging.
Shivaram Venkataraman has won the 2016-2017 “Demetri Angelakos Memorial” Achievement Award
Shivaram Venkataraman has received the 2016-2017 “Demetri Angelakos Memorial” Achievement Award who recognizes students that “in addition to conducting research, unselfishly take the time to help colleagues beyond the normal cooperation existing between fellow students“. There is hard to imagine a more deserving recipient than Shivaram. During his PhD, Shivaram has been generous with his time and sharing credit to a fault. He has helped his peers in every imaginable way; after six years his colleagues still have to hear him saying “no” when asked for help. He has been a trusted sounding board for other graduate students (and even faculty) when it comes to feedback on their research, talks, and papers. Shivaram has been, without exaggeration, the nexus of knowledge…
RISELab at Spark Summit
This year, Spark Summit East was held in Boston between February 7-9. With over 1,500 attendees, this was the largest Spark Summit ever outside the Bay Area. Apache Spark, developed in large at AMPLab (the precursor of RISELab), is now the de-facto standard of big data processing. Like the previous Spark summits, UC Berkeley had a very strong presence. Ion Stoica gave a keynote on RISELab, describing the lab’s research focus on addressing a long-standing grand challenge in computing: enable machines to act autonomously and intelligently, to rapidly and repeatedly take appropriate actions based on information in the world around them. The presentation also discussed some early results from two recent projects, Drizzle and Opaque, which had their own presentations…