Blogs

We don’t need Data Engineers, we need better tools for Data Scientists

Devin Petersohn

In most companies, Data Engineers support the Data Scientists in various ways. Often this means translating or productionizing the notebooks and scripts that a Data Scientist has written. A large portion of the Data Engineer’s role could be replaced with better tooling for Data Scientists, freeing Data Engineers to do more impactful (and scalable) work.

Feature Stores: The Data Side of ML Pipelines

Sarah Wooders

We need a principled way of managing state in real-time ML pipelines.Written by Sarah Wooders, Peter Schafhalter, and Joey GonzalezThe RISE of Feature StoresAs more models are deployed in real-world pipelines, the recurring lesson is that data and data featurization matters above all else. The last generation of big data systems scaled ML to real-world datasets, and…

AI and Memory Wall

Amir Gholami

(This blogpost has been written in collaboration with Zhewei Yao, Sehoon Kim, Michael W. Mahoney, and Kurt Keutzer. The data used for this study is available online.).Figure 1: The amount of compute, measured in Peta FLOPs, needed to train SOTA models, for different CV, NLP, and Speech models, along with the different scaling of Transformer…

Why every data scientist using pandas needs Modin — Bringing SQL to Dataframes

Jorge Torres

Bringing SQL to Dataframes — Why every data scientist using pandas needs ModinWritten by Jorge Torres and Devin PetersohnWhile recently speaking with a data scientist friend from the RiseLab in Berkeley who primarily operates in the pandas API using Modin. She mentioned that she was trying to solve a problem for a client who required her to write…

Alvin Cheung awarded the 2020 Intel Outstanding Researcher Award

David Schonenberg

Congratulations Alvin! Originally posted on EECS News and Intel.com: EECS Assistant Profs. Alvin Cheung and Jonathan Ragan-Kelley are among 18 winners of Intel’s 2020 Outstanding Research Awards (ORA). These awards recognize exceptional contributions made through Intel university-sponsored research.  Cheung and Ragan-Kelley are developing ARION, a system for compiling programs onto heterogeneous platforms. The team will use verified lifting, which rewrites legacy code into a clean specification, stripping away optimizations that target legacy architectures. This spec, written in a DSL, can then be compiled to new platforms, sometimes with orders of magnitude of speedup in resulting code performance. Intel’s 2020 Outstanding Researcher Awards Recognize 18 Academic Researchers

Alessandro Chiesa awarded the 2021 Sloan Research Fellowship

David Schonenberg

RISELab faculty member Alessandro Chiesa–along with five other Berkeley assistant professors–was awarded a Sloan Research Fellowship, one of the most competitive and prestigious awards available to early career researchers. View the complete list of 2021 fellows here. Congratulations Alessandro! 

David Patterson wins Frontiers of Knowledge Award

David Schonenberg

Congratulations Dave! Reposted from EECS News: CS Prof. Emeritus David Patterson has won the 13th BBVA Foundation Frontiers of Knowledge Award in Information and Communication Technologies.  He shares the award with John Hennessy of Stanford University “for taking computer architecture, the discipline behind the central processor or ‘brain’ of every computer system, and launching it as a new scientific area.”  The citation says that Patterson and Hennessy “are synonymous with the inception and formalization of this field.  Before their work, the design of computers – and in particular the measurement of computer performance – was more of an art than a science, and practitioners lacked a set of repeatable principles to conceptualize and evaluate computer designs. Patterson and Hennessy provided, for…

RSVP for the RISELab Poster Session / BEARS Symposium – Thursday February 11, 2021

David Schonenberg

We are pleased to announce that RISELab will hold a virtual poster session as part of the BEARS 2021 Research Symposium on February 11, 2021 from 1:00-2:30pm PST. This virtual event will highlight the exciting projects that RISELab’s researchers have been working on in recent months. You will have the opportunity to view posters and interact virtually with the graduate and undergraduate poster presenters, RISELab faculty, and other event guests. Remember, you must RSVP separately for both events: Register for BEARS here. (Select UCB Student, Employee or Industrial Affiliate for complimentary registration.) RSVP for the RISELab poster session here. A RISELab Poster Session event link will be emailed to everyone on our RSVP list on the day of the event. Please RSVP by February 10. Questions? Email us…

Read + Watch: “New Directions in Cloud Programming” from CIDR 2021

David Schonenberg

RISELab researchers Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, Matthew Milano presented “New Directions in Cloud Programming” at CIDR 2021. Read the paper here. Watch the talk on youtube. ABSTRACT Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semantics, Availablity, Consistency and Targets of optimization. We propose to migrate developers gradually to PACT programming by lifting familiar code into our more declarative level of…

“Neural-Backed Decision Trees” Accepted to ICLR 2021

Boban Zarkovich

Want to improve accuracy, interpretability, and generalization for your production models? Check out “Neural-Backed Decision Trees” from Professor Joseph E. Gonzalez’s group — including Alvin Wan, Lisa Dunlap, Suzie Petryk, and others — just accepted to ICLR 2021 and just one pip install away. See the brief 3-minute introduction: https://youtu.be/fQ2eNFCSRiA or the updated technical talk with new results and 4 additional human studies: https://youtu.be/bC5n1Yov7D0

Best paper award at NeurIPS 2020

Boban Zarkovich

RISELab faculty Prof. Michael Mahoney and his postdocs Michal Derezinski and Rajiv Khanna are recipients of NeurIPS 2020 Best Paper Award for their paper “Improved Guarantees and a Multiple-Descent Curve for Column Subset Selection and the Nyström Method“. To quote the selection committee: “(…) this paper is expected to have substantial impact and give new insight into (…) kernel methods, feature selection, and the double-descent behavior of neural networks.” Congratulations to Michael, Michal and Rajiv!

Michael Jordan wins 2021 AMS Ulf Grenander Prize

Boban Zarkovich

Prof. Michael I. Jordan, one of RISELab affiliated faculty, has been awarded the 2021 American Mathematical Society (AMS) Ulf Grenander Prize in Stochastic Theory and Modeling. The prize, which was established in 2016, recognizes “exceptional theoretical and applied contributions in stochastic theory and modeling.” It is awarded for “seminal work, theoretical or applied, in the areas of probabilistic modeling, statistical inference, or related computational algorithms, especially for the analysis of complex or high-dimensional systems.” Jordan, who has a split appointment in Statistics, was cited for “foundational contributions to machine learning, especially unsupervised learning, probabilistic computation, and core theory for balancing statistical fidelity with computation.” The prize is awarded every three years, making Jordan the second recipient of the honor.

SEC 2020 Best Paper Award!

Boban Zarkovich

Congratulations to Samvit Jain and Prof. Joey Gonzalez of RISELab for winning the Best Paper award at The Fifth ACM/IEEE Symposium on Edge Computing for the collaborative paper “Spatula: Efficient Cross-Camera Video Analytics on Large Camera Networks”!

Natacha Crooks wins 2020 ACM SIGOPS Dennis M. Ritchie dissertation award

David Schonenberg

Originally posted on UC Berkeley EECS News   Natacha Crooks wins 2020 ACM SIGOPS Dennis M. Ritchie dissertation award CS Assistant Prof. Natacha Crooks has won the 2020 ACM Special Interest Group on Operating Systems (SIGOPS) Dennis M. Ritchie dissertation award for her thesis titled “A Client-Centric Approach to Transactional Datastores.”  The award, which recognizes creative research in software systems, was bestowed upon a dissertation which a colleague described as “a landmark, with deep and beautiful results in transactions and distributed consistency, and systems that exploit them.”  The award committee commented that “Natacha Crooks’ thesis achieves something rare: a new conceptual framework for client-centric consistency and two efficient systems built on those insights. The document for this attractive package is…

RISE Camp 2020 materials are available now!

Boban Zarkovich

Video recordings of all the presentations can be viewed on our YouTube channel. Slides and tutorials can be found on the event website. Thanks to all the participants who made RISE Camp 2020 a success!  

How to ensure a data scientist is never productive

Devin Petersohn

Photo by Andrea Piacquadio (pexels.com)We need to start placing a higher value on data scientists’ time than we do on machine timeWhile data science tools are being optimized to perform well on microbenchmarks, they are becoming more and more difficult to use. Is the benchmark performance worth the human time cost it takes to get there?…

Ray Summit happening this Wednesday, September 30th through Thursday, October 1st

Boban Zarkovich

Inaugural Ray Summit is happening this Wednesday, September 30th through Thursday, October 1st. With thousands of the global Ray community from over 100 countries planning to attend, it’s an important milestone for the Ray project that started at U.C. Berkeley’s RISELab nearly four years ago. If you haven’t registered yet, please register here (it’s free!) to get access to the livestream and all session videos. You will hear exciting announcements about the latest features in Ray, new integrations with popular libraries, real-world use cases, and the latest research in AI, scalable systems, and security. Looking forward to seeing you all at the summit! 

The State of the Serverless Art

Joe Hellerstein

The Hydro team in Berkeley’s RISELab is working to “put the state into the state-of-the-art” in serverless computing.Continue reading on riselab »

Secure Collaborative XGBoost on Encrypted Data

Rishabh Poddar

A library for multi-party training and inference of XGBoost models using secure enclavesPhoto by Markus Spiske on Unsplash (modified).We recently released Secure XGBoost, a library that enables collaborative XGBoost training and inference on encrypted data. Secure XGBoost is part of the umbrella MC² project, under which we are working on a variety of tools for privacy-preserving…

Context-Aware Fast Food Recommendation at Burger King with RayOnSpark

Jason Dai

Authors: Luyang Wang (lwang1@rbi.com), Kai Huang (kai.huang@intel.com), Jiao Wang (jiao.wang@intel.com), Shengsheng Huang (shengsheng.huang@intel.com), Jason Dai (jason.dai@intel.com)Deep learning based recommendation models have been widely used in real world recommendation systems. Common methods perform concatenation of user and item embedding vectors, then feed them into MLP (multilayer perceptron) to generate final predictions. However, these methods fail to…

IEEE Data Engineering Bulletin: COVID-19 Contact Tracing Update

David Schonenberg

Please check out Professor Gonzalez’s post to the June Edition of the IEEE Data Engineering bulletin.  This special edition provides an overview of several of the major digital contact tracing efforts around the world as well as some of the societal implications.

RISELab Best Paper Awards

David Schonenberg

RISELab researchers were recognized with best paper awards at recent SIGMOD, VLDB and ICDE conferences. 2019 ACM SIGMOD Research Highlight Award “Interactive Checks for Coordination Avoidance”, VLDB 2019 — Michael J. Whittaker and Joseph M. Hellerstein VLDB 2019 selected best papers “Interactive Checks for Coordination Avoidance” — Michael J. Whittaker and Joseph M. Hellerstein “Autoscaling Tiered Cloud Storage in Anna.” — Chenggang Wu, Vikram Sreekanti, Joseph M. Hellerstein ICDE 2018 selected best papers “Anna: A KVS For Any Scale.” — Chenggang Wu, Jose M. Faleiro, Yihan Lin, Joseph M. Hellerstein. Congratulations to RISELab’s Jose Faleiro, Joe Hellerstein, Vikram Srikanti, Michael Whittaker, and Chenggang Wu in recognition of their groundbreaking research!

Please join the first Ray Summit Connect

David Schonenberg

Hello, Anyscale will be hosting a complimentary new online speaker series, Ray Summit Connect, throughout the Summer. Join these events to learn more about the future of Ray. The first hour-long event in the series kicks off May 13th. During this event, Prof. Michael Jordan and I will provide our perspectives on the state and future of AI and computing. Please check out anyscale.com/events to register and see other future events coming over the coming months. If you are interested, please register above. Also, please feel free to forward this email to your colleagues. We look forward to having you join this event! Best, Ion

Prof. Aditya Parameswaran awarded Sloan Research Fellowship

Boban Zarkovich

Prof. Aditya Parameswaran, RISELab faculty, is one of the nine Sloan Research Fellowship recipients from UC Berkeley, in recognition of his work on building tools to enable people unfamiliar with programming to understand large datasets. You can read related articles here and here. Full list of recipients on Sloan Foundation’s website.

RISELab Open House and Poster Session 2/13/20

David Schonenberg

This year’s BEARS Symposium will be held on Thursday 2/13/20 at the International House in Berkeley from 9am-5pm. https://eecs.berkeley.edu/research/bears/2020 Concurrently, RISELab will host an Open House and Poster session from 1pm-3pm in the lab. Please join us for some engaging discussions over light snacks and refreshments. RISELab is located in 465 Soda Hall, on the corner of Hearst and LeRoy Avenues in Berkeley. See you there!

Ray Meetup at Galvanize SF!

Boban Zarkovich

Join us for Ray (http://ray.io/) meetup on January 30th at 6:00pm, hosted at Galvanize SF! Please RSVP here: https://www.meetup.com/Bay-Area-Ray-Meetup/events/267883815/  

Two RISE Startups make list of “47 Enterprise Startups to Bet Your Career On in 2020”

David Schonenberg

DataBricks and Anyscale made Business Insider’s list of “47 enterprise startups to bet your career on in 2020.” View the list on reddit if you don’t have a Business Insider account. Anyscale is the latest startup to come out of RISELab. A note from Anyscale co-founder and RISELab director Ion Stoica: Anyscale is the future of distributed computing. Founded by the creators of Ray, an open source project from the UC Berkeley RISELab, Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center. Anyscale empowers organizations to bring AI applications to production faster, reduce development costs, and eliminate the need for in-house expertise to deploy and manage…

Professor Michael Jordan wins 2020 IEEE John von Neumann Medal

David Schonenberg

CS Prof. Michael I. Jordan has won the prestigious John von Neumann Medal from the Institute of Electrical and Electronics Engineers (IEEE). The award was established in 1990 to acknowledge “outstanding achievements in computer-related science and technology.” Jordan, who was ranked as the world’s most influential computer scientist in 2016 by Science magazine, was cited for “For contributions to machine learning and data science.” Jordan began developing recurrent neural networks as a cognitive model in the 1980s, was prominent in the formalisation of variational methods for approximate inference, and popularised both the expectation-maximization algorithm and Bayesian networks among the machine learning community. Jordan is the fifth Berkeley CS faculty member to win this award. Congratulations to Professor Jordan! https://eecs.berkeley.edu/news#news-3106

Using Ray as a foundation for a real-time analytic monitoring framework

Edmon Begoli

Ray’s actor model, and its simplicity of implementation in Python-based frameworks is a primary motivation for using it in a higher-level research framework (Realm, to be published) for real-time analytic monitoring of critical events (e.g. suicide risk, infectious diseases, etc.) that my colleagues and I are working on. The purpose of Realm is to provide: real time alerts in response to, for example, continuously occurring, high-frequency clinical events, support for online learning, distribution of model updates through the hierarchy of models (local/embedded, regional, and central), and targeted, selective, and context-specific updates to these models. While other frameworks provide most of these functions (Akka, Spark, Kafka, etc.), we are exploring and using Ray because of its simplicity of implementation (using actors…

RISE Camp 2019 videos!

Boban Zarkovich

Videos of all talks from RISE Camp 2019 are now available for viewing on RISELab YouTube channel.

RISE Camp 2019 photos!

Boban Zarkovich

Photos from the awesome RISE Camp 2019 have been posted to our Facebook page. (If you haven’t been aware of its existence, now is your chance to click “Like” and start getting all the latest RISELab news in your feed!)

RISE Camp 2019 – UPDATED Live stream links

Boban Zarkovich

Quick update: There are two separate links for live streaming (one for each day). Please use the following: YouTube RISE Camp 2019 Live Stream – Day 1 YouTube RISE Camp 2019 Live Stream – Day 2     If you have any questions or need assistance registering please feel free to contact us at risecamp@cs.berkeley.edu Best regards, The RISE Camp Team

RISE Camp 2019 – Live stream on YouTube

David Schonenberg

RISE Camp will be live streamed on YouTube later this week! Introductions begin at 8:45 AM on Thursday October 17, 2019. Use the following link to view. Find the agenda for both days here on the RISE Camp website. If you are interested in attending in-person, General Registration spots are available for a $600 registration fee. Please use the following link to register: https://www.eiseverywhere.com/ereg/index.php?eventid=478734&categoryid=3352675 Members of the UC Berkeley research community are encouraged to email risecamp@cs.berkeley.edu to inquire about complimentary spots.   If you have any questions or need assistance registering please feel free to contact us at risecamp@cs.berkeley.edu Best regards, The RISE Camp Team

RISE Camp 2019 – Registration now open

David Schonenberg

Registration for UC Berkeley RISELab’s annual RISE Camp is now open. The event will take place on October 17-18 at the International House in Berkeley. The agenda and latest updates can be found on the RISE Camp website here: https://risecamp.berkeley.edu A limited number of General Registration spots are available for a $600 registration fee. Please use the following link to register: https://www.eiseverywhere.com/ereg/index.php?eventid=478734&categoryid=3352675 If you have any questions or need assistance registering please feel free to contact us at risecamp@cs.berkeley.edu Best regards, The RISE Camp Team

The O’Reilly Data Show Podcast: Michael Mahoney on developing a practical theory for deep learning

David Schonenberg

Prof. Michael Mahoney, one of RISELab’s faculty members, was interviewed on O’Reilly Data Show Podcast.   Topics included understanding deep neural networks and developing a practical theory for deep learning, the new Hessian AWare Quantization (HAWQ) framework for addressing problems pertaining to model size and inference speed/power, and how these relate to challenges at the foundations of data analysis.

Professor Popa: Decentralized Security CS 294-163

David Schonenberg

Lectures: Tue/Thur 3:30pm – 4:59pm, 310 Soda Course description: Recently, there has been much excitement in both academia and industry around the notion of decentralized security, which refers to, loosely speaking, security mechanisms that do not rely on the trustworthyness of any central entity. In only a few years, this area has generated many beautiful cryptographic constructs as well as exciting systems with real-world adoption. The course will cover topics such as decentralized ledgers, blockchain/cryptocurrencies, decentralized access control, secure multi-party computation, federated learning, coopetitive learning, and others.   This is an advanced course, which will go deeply into both cryptography and systems. A solid foundation in cryptography is required, and a similar foundation in systems is beneficial. Logistics: The course is…

Prof. Raluca Popa recepient of Bakar Faculty Fellowship Spark Fund Award

Boban Zarkovich

Prof. Popa, one of RISELab’s core faculty members, has been selected for the Bakar Fellows Program, which supports faculty working to apply scientific discoveries to real-world issues in the fields of engineering, computer science, chemistry, and biological and physical sciences. With her Bakar Fellows Spark Award, Prof. Popa will design and build a data encryption platform that will enable collaborative machine learning studies by performing these multi-party computations under encryption. https://vcresearch.berkeley.edu/bakarfellows/about

Professor Parameswaran receives VLDB Early Career Award

David Schonenberg

Congratulations to RISELab’s newest faculty member Aditya Parameswaran  for winning this year’s VLDB Early Career Award. The award recognizes a researcher who has demonstrated research impact through a specific technical contribution of high significance since completing their Ph.D.   More information can be found here: https://vldb.org/2019/?2019-vldb-endowment-awards Congrats Aditya!    

Prof. Popa Receives the Microsoft Research Faculty Fellowship

David Schonenberg

Congratulations to Prof. Raluca Ada Popa for receiving the prestigious Microsoft Research Faculty Fellowship! The fellowship recognizes innovative, promising new faculty members, whose exceptional talent for research and innovation identifies them as emerging leaders in their fields.  

RadLab students and faculty win “Test of Time” award for the “Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning” paper

Boban Zarkovich

Students and faculty of RadLab, one of the predecessors of the RISElab, just won a “Test of Time” award  for the most influential paper of the 40 presented at the original conference in 2009 . It combined systems and machine learning fields, setting a precedent for a popular combination today. You can read the official announcement here.

“A Swiss Army Infinitesimal Jackknife” paper wins “Notable Paper Award” at the 2019 Artificial Intelligence and Statistics (AISTATS) conference

Boban Zarkovich

The following paper from the RISELab has won the “Notable Paper Award” at the 2019 Artificial Intelligence and Statistics (AISTATS) conference: A Swiss Army Infinitesimal Jackknife; Giordano, R., Stephenson, W., Liu, R., Jordan, M. I. & Broderick, T.  (2019); In K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of the Twenty-Second Conference on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan.

RISELab March Newsletter

David Schonenberg

For a summary of recent news and publications, check out the RISELab Newsletter here.  

Ray Distributed AI Framework Curriculum Offered on the Intel® AI DevCloud

Ion Stoica

by Stephen Offer and Ellick Chan As a consequence of the growing computational demands of machine learning algorithms, the need for powerful computer clusters is increasing. However, existing infrastructure for implementing parallel machine learning algorithms is still primitive. While good solutions for specific use cases (e.g., parameter servers or hyperparameter search) and parallel data processing do exist (e.g., Hadoop or Spark), to parallelize machine learning algorithms, practitioners often end up building their own customized systems, leading to duplicated efforts. To help address this issue, the RISELab has created Ray, a high-performance distributed execution framework. Ray supports general purpose parallel and distributed Python applications and enables large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the…

Programming in Ray: Tips for first-time users

Ion Stoica

Ray is a general-purpose framework for programming a cluster. Ray enables developers to easily parallelize their Python applications or build new ones, and run them at any scale, from a laptop to a large cluster. Ray provides a highly flexible, yet minimalist and easy to use API. Table 1 shows the core of this API. In this blog, we describe several tips that can help first-time Ray users to avoid some common mistakes that can significantly hurt the performance of their programs. API Description Example ray.init() Initialize Ray context. @ray.remote Function or class decorator specifying that the function will be executed as a task or the class as an actor in a different process. @ray.remote        @ray.remote           def…

New DARE Website Launched

David Schonenberg

We are excited to announce the launch of RISELab’s DARE program. One of the most challenging barriers to participating in research for undergraduate students is reaching out to professors and finding a match that works for both the student and the professor.  DARE, founded by Professor Raluca Ada Popa and her students Wenting Zheng and Jessica Ji, acts as an active liaison to remove this barrier and make electrical engineering and computer science research more readily accessible, as well as to encourage diversity of thought in these growing fields. DARE strongly encourages women and traditionally under-represented groups to apply. Website: https://dare.berkeley.edu Raluca Ada Popa

Modern Parallel and Distributed Python: A Quick Tutorial on Ray

Robert Nishihara

Ray is an open source project for parallel and distributed Python. This article was originally posted here. Parallel and distributed computing are a staple of modern applications. We need to leverage multiple cores or multiple machines to speed up applications or to run them at a large scale. The infrastructure for crawling the web and responding to search queries are not single-threaded programs running on someone’s laptop but rather collections of services that communicate and interact with one another. This post will describe how to use Ray to easily build applications that can scale from your laptop to a large cluster. Why Ray? Many tutorials explain how to use Python’s multiprocessing module. Unfortunately the multiprocessing module is severely limited in…

Cloud Programming Simplified: A Berkeley View on Serverless Computing

Ion Stoica

David Patterson and Ion Stoica The publication of “Above the Clouds: A Berkeley View of Cloud Computing” on February 10, 2009 cleared up the considerable confusion about the new notion of “Cloud Computing.” The paper defined what Cloud Computing was, where it came from, why some were excited by it, what were its technical advantages, and what were the obstacles and research opportunities for it to become even more popular.  More than 17,000 citations to this paper and an abridged version in CACM—with more than 1000 in the past year—document that it continues to shape the discussions and the evolution of Cloud Computing. “Cloud Programming Simplified: A Berkeley View on Serverless Computing” with some of the same authors commemorates the…

Scaling Interactive Pandas Workflows with Modin – Talk at PyData NYC 2018

Devin Petersohn

In this talk, we will present Modin, a middle layer for DataFrames and interactive data science. Modin, formerly Pandas on Ray, is a library that allows users to speed up their Pandas workflows by changing a single line of code. During the presentation, we will discuss interesting ways Modin is being used, and show how we improve the performance of the most popular Pandas operations. Modin is an early-stage project at UC Berkeley’s RISELab designed to facilitate the use of distributed computing for Data Science. Often, a challenge encountered when trying to use tools for large-scale data is that there is a significant learning overhead. Modin is designed to expose a set of familiar APIs (Pandas, SQL, etc.) and internally…

Ray: Application-level scheduling with custom resources

Alexey Tumanov

  Application-level scheduling with custom resources New to Ray? Start Here! Ray intends to be a universal framework for a wide range of machine learning applications. This includes distributed training, machine learning inference, data processing, latency-sensitive applications, and throughput-oriented applications. Each of these applications has different, and, at times, conflicting requirements for resource management. Ray intends to cater to all of them, as the newly emerging microkernel for distributed machine learning. In order to achieve that kind of generality, Ray enables explicit developer control with respect to the task and actor placement by using custom resources. In this blog post we are going to talk about use cases and provide examples. This article is intended for readers already familiar with…

What Is the Role of Machine Learning in Databases?

Zongheng Yang

(This article was authored by Sanjay Krishnan, Zongheng Yang, Joe Hellerstein, and Ion Stoica.) What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query. The au courant research direction, inspired by trends in Computer Vision, Natural Language Processing, and Robotics, is to apply deep learning; let the database learn the value of each execution strategy by executing different query plans repeatedly (an homage to Google’s robot “arm farm”) rather through a pre-programmed analytical…

A History of Postgres

Joe Hellerstein

(crossposted from databeta.wordpress.com) The ACM began commissioning a series of reminiscence books on Turing Award winners. Thanks to hard work by editor Michael Brodie, the first one is Mike Stonebraker’s book, which just came out. I was asked to write the chapter on Postgres. I was one of the large and distinguished crew of grad students on the Postgres project, so this was fun. ACM in its wisdom decided that these books would be published in a relatively traditional fashion—i.e. you have to pay for them. The publisher, Morgan-Claypool, has this tip for students and ACM members: Please note that the Bitly link goes to a landing page where Students, ACM Members, and Institutions who have access to the ACM…

An Overview of the CALM Theorem

Joe Hellerstein

For folks who care about what’s possible in distributed computing: Peter Alvaro and I wrote an introduction to the CALM Theorem and subsequent work that is now up on arXiv. The CALM Theorem formally characterizes the class of programs that can achieve distributed consistency without the use of coordination. — Joe Hellerstein (Cross-posted from databeta.wordpress.com.) I spent a good fraction of my academic life in the last decade working on a deeper understanding of how to program the cloud and other large-scale distributed systems. I was enormously lucky to collaborate with and learn from amazing friends over this period in the BOOM project, and see our work picked up and extended by new friends and colleagues. Our research was motivated by…

Confluo: Millisecond-level Queries on Large-scale Live Data

Anurag Khandelwal

Confluo is a system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad-hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. We are excited to release Confluo as an open-source C++ project, comprising: Confluo’s data structure library, that supports high throughput ingestion of logs, along with a wide range of online (live aggregates, conditional trigger executions, etc.) and offline (ad-hoc filters, aggregates, etc.) queries, and, A Confluo server implementation, that encapsulates the data structures and exposes its operations via an RPC interface, along with client libraries in C++, Java and Python.…

An Open Source Tool for Scaling Multi-Agent Reinforcement Learning

Eric Liang

We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and how we designed for it in RLlib. Our goal is to enable multi-agent RL across a range of use cases, from leveraging existing single-agent algorithms to training with custom algorithms at large scale.

RISE Camp 2018 Tutorials now available

David Schonenberg

Tutorials from RISE Camp held at International House in Berkeley, CA on October 11-12, 2018 are now available the RISE Camp website here. Additional information including the Agenda can also be found there.

ACM SenSys ’18 Best Paper Runner-up Award: System Architecture Directions for Post-SoC/32-bit Networked Sensors

Hyung-Sin Kim

RISELab publication “System Architecture Directions for Post-SoC/32-bit Networked Sensors,” authored by Hyung-Sin Kim, Michael Andersen, Kaifei Chen, Sam Kumar, William Zhao, Kevin Ma, and Prof. David Culler, has won the best paper runner-up award at ACM SenSys 2018. This paper triggers paradigm shifts on low-power embedded networked system design, which was formed by a two-decade old paper from UC Berkeley.

Prof. Culler receives ACM SenSys Test of Time Award 2018!

Hyung-Sin Kim

Prof. David Culler, one our affiliated faculty, has received ACM SenSys Test of Time Award 2018 for his paper “IP is Dead, Long Live IP for Wireless Sensor Networks,” published in ACM SenSys 2008. The Test of Time Award is to recognize papers that are at least 10 years old and have had long lasting impact on networked embedded sensing system science and engineering. Congratulations to Prof. Culler!

RISE Camp 2018 Videos!

Boban Zarkovich

RISE Camp 2018 videos have been posted to our YouTube channel! Please enjoy and share!

RISE Camp 2018 Photos!

Boban Zarkovich

Lost of great photos from last week’s RISE Camp can be found here! And if you haven’t already, please make sure you “like” our Facebook page, for all the latest RISELab  news.

RISELab September Newsletter

David Schonenberg

A summary of our recent news, publications, awards and a note from our Director Ion Stoica is available online at this link.

Prof. Popa receives an award from The Hellman Fellows Fund!

Boban Zarkovich

Prof. Raluca Popa, one our core faculty, has received an award from The Hellman Fellows Fund, which supports junior faculty research on the ten campuses of the UC system and at four private institutions. Congratulations to Prof. Popa!

SQL Query Optimization Meets Deep Reinforcement Learning

Zongheng Yang

We show that deep reinforcement learning is successful at optimizing SQL joins, a problem studied for decades in the database community.  Further, on large joins, we show that this technique executes up to 10x faster than classical dynamic programs and 10,000x faster than exhaustive enumeration.  This blog post introduces the problem and summarizes our key technique; details can be found in our latest preprint, Learning to Optimize Join Queries With Deep Reinforcement Learning. SQL query optimization has been studied in the database community for almost 40 years, dating all the way back from System R’s classical dynamic programming approach.  Central to query optimization is the problem of join ordering.  Despite the problem’s rich history, there is still a continuous stream…

Going Fast and Cheap: How We Made Anna Autoscale

Vikram Sreekanti

Background: In an earlier blog post, we described a system called Anna, which used a shared-nothing, thread-per-core architecture to achieve lightning-fast speeds by avoiding all coordination mechanisms. Anna also used lattice composition to enable a rich variety of coordination-free consistency levels. The first version of Anna blew existing in-memory KVSes out of the water: Anna is up to 700x faster than Masstree, an earlier state-of-the-art research KVS, and up to 800x faster than Intel’s “lock-free” TBB hash table. You can find the previous blog post here and the full paper here. We refer to that version of Anna as “Anna v0.” In this post, we describe how we extended the fastest KVS in the cloud to be extremely cost-efficient and…

AP releases “bombshell” report on Google’s location history prompted by RISELab blog post

K. Shankari

Back in May 2018, K. Shankari posted a blog post on the blurry boundaries associated with Google location tracking and the interesting questions that they raised around consent, control and competition. An AP reporter, Ryan Nakashima, saw the blog posts, and contacted her for more details. While he was not able to reproduce the behavior that she had observed, he was able to work with Jonathan Mayer‘s group at Princeton to find similarly unclear and confusing privacy policies related to location history. The resulting story and step-by-step guide was posted on Monday, Aug 13, and had a large impact. In the technical press, it was picked up by at least Wired, CNET, TechCrunch, Gizmodo and Slashdot. Ryan, Jonathan and Shankari…

Notes from the first Ray meetup

Boban Zarkovich

The Ray team is starting a series of meetups, the first of which was held at OpenAI (in San Francisco) on August 2, 2018, with over 50 people in attendance! Here is a great summary of what was presented, written by Ben Lorica of O’Reilly Media: https://www.oreilly.com/ideas/notes-from-the-first-ray-meetup

RISELab publication wins best paper award at SIGMOD GRADES-NDA 2018

Anand Padmanabha Iyer

RISELab publication “Bridging the GAP: Towards Approximate Graph Analytics“, authored by Anand Iyer, Aurojit Panda, Shivaram Venkataraman, Mosharaf Chowdhury, Prof. Aditya Akella, Prof. Scott Shenker and Prof. Ion Stoica, has won the best paper award at ACM GRADES-NDA 2018, co-located with SIGMOD. The paper proposes the use of approximation to speed up distributed graph processing.

Exploratory data analysis of genomic datasets using ADAM and Mango with Apache Spark on Amazon EMR (AWS Big Data Blog Repost)

Alyssa Morrow

Note: This blogpost is replicated from the AWS Big Data Blog and can be found here. As the cost of genomic sequencing has rapidly decreased, the amount of publicly available genomic data has soared over the past couple of years. New cohorts and studies have produced massive datasets consisting of over 100,000 individuals. Simultaneously, these datasets have been processed to extract genetic variation across populations, producing mass amounts of variation data for each cohort. In this era of big data, tools like Apache Spark have provided a user-friendly platform for batch processing of large datasets. However, to use such tools as a sufficient replacement to current bioinformatics pipelines, we need more accessible and comprehensive APIs for processing genomic data. We…

A Short History of Prediction-Serving Systems

Daniel Crankshaw

Machine learning is an enabling technology that transforms data into solutions by extracting patterns that generalize to new data. Much of machine learning can be reduced to learning a model — a function that maps an input (e.g. a photo) to a prediction (e.g. objects in the photo). Once trained, these models can be used to make predictions on new inputs (e.g., new photos) and as part of more complex decisions (e.g., whether to promote a photo). While there are thousands of papers published each year on how to design and train models, there is surprisingly less research on how to manage and deploy such models once they are trained. It is this later, often overlooked, topic that we discuss…

The Right to not be Tracked II: in which I turn off the location permission for Google, but it tracks me anyway

K. Shankari

I recently published a post about the blurry boundaries between standard system services and Google Maps on Android. I argued that these boundaries made it hard to talk about consent and competition around location services. However, the branching factor for the data sharing made the argument complex and hard to follow. Even as I was writing that post, in the train on the way into Berkeley, I started getting notifications from the Google app about the weather at my location. The Google app (aka Google Now) is a virtual assistant that is intended to provide context-sensitive helpful information to users. It is closed source, pre-installed, and it cannot be uninstalled or disabled. And I had already turned off all its…

The Right to not be Tracked: a Spotlight on Google Maps and Android Location Tracking

K. Shankari

There has been a lot of interest in data collected about users by Facebook recently. Journalists have been shocked when they downloaded the data that Facebook has on them. Most of this concern has been focused around data collected through explicit user interaction such as web browsing, or clicking on “Like” and “Share” buttons. Background data collection, which occurs without any explicit user intervention, is arguably creepier, because it collects data whether or not you interact with the service. For example, Facebook has been criticized for logging texts and phone calls in the background. Facebook argues that users consented to sharing the data, although many users are still skeptical about how explicit the consent was. Similarly, Uber had to backtrack…

Michael I. Jordan: Artificial Intelligence — The Revolution Hasn’t Happened Yet

Boban Zarkovich

(This article has originally been published on Medium.com.) Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and venture capitalists alike. As with many phrases that cross over from technical academic fields into general circulation, there is significant misunderstanding accompanying the use of the phrase. But this is not the classical case of the public not understanding the scientists — here the scientists are often as befuddled as the public. The idea that our era is somehow seeing the emergence of an intelligence in silicon that rivals our own entertains all of us — enthralling us and frightening us in equal measure. And, unfortunately, it distracts us. There is a different narrative that one can…

MLPerf: SPEC for ML

David Patterson

The RISE Lab at UC Berkeley today joins Baidu, Google, Harvard University, and Stanford University to announce a new benchmark suite for machine learning called MLPerf at the O’Reilly AI conference in New York City (see https://mlperf.org/). The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance eventually for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users. Historical Inspiration. We are motivated in part by the System Performance Evaluation Cooperative (SPEC) benchmark for general-purpose computing that drove rapid,…

Open source platform + undergraduate energy = sustainability research

K. Shankari

This Earth Day, join a study on motivating sustainable transportation behavior. I have blogged about the e-mission project earlier in the context of the National Transportation Data Challenge. (https://rise.cs.berkeley.edu/blog/making-cities-safer-data-collection-vision-zero/). To recap, e-mission focuses on building an extensible platform that can instrument the end-to-end multi-modal travel experience at the personal scale and collate it for analysis at the societal scale. In particular, it combines background data collection of trips, classified by modes, with user-reported incident data, and context-sensitive surveys. I also blogged earlier about involving undergraduates in research (https://amplab.cs.berkeley.edu/getting-a-dozen-20-year-olds-to-work-together-for-fun-and-social-good/). To recap, the challenges at the time included managing different skill levels, compressing the learn-plan-build cycle into one semester, and the fact that undergraduates typically don’t have the experience to build platform…

NSDI ’18 Best Paper Award: NetChain: Scale-Free Sub-RTT Coordination

Boban Zarkovich

Prof. Ion Stoica  and Xin Jin, a former postdoc in RISELab (now a faculty at John Hopkins University), have received the best paper award for their paper on NetChain which shows that it is possible to do coordination in distributed systems at line speed. You can read more about it at the official NSDI link.

Online Foundations of Data Science Course Launches on edX!

Boban Zarkovich

UC Berkeley’s pathbreaking entry-level course on the Foundations of Data Science (Data 8) is launching on edX on April 2. This makes the fastest-growing class in UC Berkeley history available to everyone. Foundations of Data Science teaches computational and inferential thinking from the ground up. It covers everything from testing hypotheses, applying statistical inferences, visualizing distributions and drawing conclusions—all while coding in Python and using real world data sets. The course is taught by award-winning Berkeley professors and designed by a team of faculty working together across Berkeley’s Computer Science and Statistics Departments, led by RISE faculty Michael Jordan. The three 5-week online courses cover: Foundations of Data Science: Computational Thinking with Python, starting on April 2, teaches the basics…

Prof. David Patterson, one of the founders of RISELab, has been awarded the Turing Award!

Jon Kuroda

We are proud to announce that Professor David Patterson, one of the RISELab founders, is a recipient of the 2017 ACM A.M. Turing Award, the annual prize given by the Association for Computing Machinery (ACM) to “an individual selected for contributions of a technical nature made to the computing community,” and often referred to as the “Nobel Prize of Computing”. He shares the award with John L. Hennessy, former President of Stanford University, for their invention of the RISC processor. For more details, please read the official ACM announcement. Congratulations to Prof. Patterson on joining the highest ranks of computer science luminaries!

Distributed Policy Optimizers for Scalable and Reproducible Deep RL

Eric Liang

In this blog post we introduce Ray RLlib, an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed policy optimizers that make it easy to use a variety of training strategies with existing reinforcement learning algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms and libraries. We discuss in more detail the design and performance of policy optimizers in the RLlib paper. What’s next for RLlib In the near term we plan to continue building out RLlib’s set of policy optimizers and algorithms. Our aim is for RLlib to serve…

Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS

Joe Hellerstein

This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural question: what would it take to build a key-value store that would excel across many orders of magnitude of scale, from a single multicore box to the global cloud? Turns out this kind of curiosity can lead to a system with pretty interesting practical…

Michael Jordan named as Plenary Lecturer at the 2018 International Congress of Mathematicians (ICM) in Rio de Janeiro!

RISE

EECS faculty continues to make big news in the outside world! Michael Jordan, the Pehong Chen Distinguished Professor in EECS and the Department of Statistics, has been named a Plenary Lecturer at the 2018 International Congress of Mathematicians (ICM) in Rio de Janeiro this August. This is an honor that has been bestowed upon a very small handful of computer scientists over the 100-year history of the ICM. (For the uninitiated, a plenary lecturer is one who gives his or her talk to all attendees of a conference at once. It’s a pretty huge deal.) Here is the list of plenary speakers at the ICM 2018 Conference. The title of Prof. Jordan’s speech will be “Dynamical, Symplectic and Stochastic Perspectives on Gradient-Based Optimization”…

NSF Expeditions Proposal Awarded to RISELab!

RISE

The UC Berkeley RISELab has been awarded the prestigious NSF Expedition Award by the National Science Foundation (NSF).  The expedition award, which provides $10 million in funding over five years, is awarded to only three labs in the country. This funding will enable the RISELab to make fundamental advances in the theory and design of real-time, intelligent, secure, and explainable systems.  This research has the potential to transform the next generation of systems that interact with the real-world, including autonomous vehicles, medical robots, privacy-preserving information exchanges, fraud detection systems, and the power grid. Read the full NSF press release and Berkeley press release for more details.

Pictures from RISE Winter 2018 retreat!

Boban Zarkovich

Our biggest retreat so far (182 registered attendees!) happened January 10 – 12, 2018, at Monterey Tides hotel. We had lots of productive interactions (presentations, meeting, poster sessions), but also managed to have some fun kayaking at the Elkhorn Slough! You can see the photos on our Facebook page.

Incubator launches AI-focused accelerator to help startups out of UC Berkeley, RISELab faculty involved

Boban Zarkovich

Berkeley-based startup incubator and venture capital fund The House is launching an AI-specific initiative backed by Google that is meant to take advantage of the technical and academic expertise in the field coming out of UC Berkeley. The faculty involved in the program include Databricks co-founder and UC Berkeley professor Ion Stoica, Berkeley Artificial Intelligence Research lab co-director Trevor Darrell and UC Berkeley professor Michael Jordan, one of the most influential computer scientists working in AI. You can read the whole article here.

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark

Shivaram Venkataraman

This work was done in collaboration with Ding Ding and Sergey Ermolin from Intel. In recent years, the scale of datasets and models used in deep learning has increased dramatically. Although larger datasets and models can improve the accuracy in many AI applications, they often take much longer to train on a single machine. However, it is not very common to distribute the training to large clusters using current popular deep learning frameworks, compared to what’s been long around in the Big Data area, as it’s often harder to gain access to a large GPU cluster and lack of convenient facilities in popular DL frameworks for distributed training. By leveraging the cluster distribution capabilities in Apache Spark, BigDL successfully performs very large-scale distributed…

November 2017 Newsletter Published!

RISE

The third installment of the RISELab newsletter went out on November 15. You can read all about what’s been going on here. If you would like to be added to our mailing list, simply fill out this form. Happy reading!

RISECamp Behind the Scenes

Jey Kottalam

  RISECamp was held at UC Berkeley on September 7th and 8th. This post looks behind the scenes at the technical infrastructure used to provide a cloud-hosted cluster for each attendee with ready-to-use Jupyter notebooks requiring only a web browser to access. Background and Requirements RISECamp is the latest in a series of workshops held by RISELab (and its predecessor, AMPLab) showcasing the latest research from the lab. The sessions consist of talks on the latest research systems produced by the lab followed by tutorials and exercises for attendees to get hands-on practical experience using our latest technologies. In the past, attendees used their own laptops to perform the hands-on exercises, with each user setting up a local development environment and manually…

RISELab to cooperate with Alibaba DAMO Academy

Boban Zarkovich

Alibaba Group launched a massive global research program, “Alibaba DAMO Academy”, by committing $15B in funding. The Academy aims to increase technological collaboration worldwide, advance the development of cutting-edge technology, and strive to make the world more inclusive by narrowing the technology gap. The Academy will cooperate with the University of California, Berkeley through its RISE Lab on areas such as secured real-time computing. For more information, please read this article.

Prof. Popa gave a keynote talk at the GE Industrial Machine Learning Workshop 2017

Boban Zarkovich

RISELab’s Raluca Ada Popa gave a keynote talk on security for machine learning systems at the GE Industrial Machine Learning Workshop 2017. https://wise.io/imlw17/ Her talk discussed security threats in deploying machine learning systems as well as three promising approaches to address some of these: secure multi-party computation, differential privacy and hardware enclaves.

RISELab and UC Berkeley’s new Foundations of Data Analysis (FODA) Institute

Michael Mahoney

We are pleased to announce the involvement of RISELab faculty with the new Foundations of Data Analysis (FODA) Institute. The FODA Institute is funded as part of the NSF TRIPODS program, and it is designed to bring together core research communities in theoretical statistics, applied mathematics, and theoretical computer science to address foundational questions at the heart of data science. In addition to RISELab faculty Michael Mahoney and Mike Jordan, also involved with this FODA/TRIPODS effort are statistics faculty Bin Yu and Fernando Perez and EECS/Simons faculty Dick Karp. We look forward to the fruitful interactions between theory, implementations, and real-time applications.   For more details, here is the link to the press release: https://data.berkeley.edu/news/berkeley-defining-next-academic-frontier-two-nsf-awards-uc-berkeley-set-stage-new-era-data

FireSim featured on the AWS Compute Blog

Sagar Karandikar

The FireSim project was recently featured on the AWS Compute Blog. Read the article here: Bringing Datacenter-Scale Hardware-Software Co-design to the Cloud with FireSim and Amazon EC2 F1 Instances.

RISELab faculty publishes paper on how to build more secure, faster AI systems

Boban Zarkovich

The RISELab faculty—who cover the fields of machine learning, systems, security, and hardware—have just published a vision paper on systems challenges for AI. They argue that future AI systems must make timely and safe decisions in unpredictable environments, be robust against sophisticated adversaries, and process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore’s Law, which will constrain the amount of data these technologies can store and process. They propose several open research directions in systems, computer architecture, and security that can address these challenges and help unlock AI’s potential to improve lives and society. Link to the press release: https://news.berkeley.edu/2017/10/16/berkeley-experts-on-how-to-build-faster-safer-and-more-secure-ai-systems/

Fast Python Serialization with Ray and Apache Arrow

Robert Nishihara

This post was originally posted here. Robert Nishihara and Philipp Moritz are graduate students in the RISElab at UC Berkeley. This post elaborates on the integration between Ray and Apache Arrow. The main problem this addresses is data serialization. From Wikipedia, serialization is … the process of translating data structures or object state into a format that can be stored … or transmitted … and reconstructed later (possibly in a different computer environment). Why is any translation necessary? Well, when you create a Python object, it may have pointers to other Python objects, and these objects are all allocated in different regions of memory, and all of this has to make sense when unpacked by another process on another machine. Serialization and deserialization…

Ray: 0.2 Release

Robert Nishihara

This was originally posted on the Ray blog. We are pleased to announce the Ray 0.2 release. This release includes the following: substantial performance improvements to the Plasma object store an initial Jupyter notebook based web UI the start of a scalable reinforcement learning library fault tolerance for actors Plasma Since the last release, the Plasma object store has moved out of the Ray codebase and is now being developed as part of Apache Arrow (see the relevant documentation), so that it can be used as a standalone component by other projects to leverage high-performance shared memory. In addition, our Arrow-based serialization libraries have been moved into pyarrow (see the relevant documentation). In 0.2, we’ve increased the write throughput of the object store…

ActiveClean featured in “the morning paper”

Ion Stoica

ActiveClean has been featured in today’s “the morning paper“. The ActiveClean project aims to develop tools and algorithms to address one of the key steps in model training pipelines: handle dirty or inconsistent data including extracting structure, imputing missing values, and handling incorrect data.

PyWren wins Best Vision Paper Award at SOCC’17

Ion Stoica

PyWren won the Best Vision paper Award at SOCC’17. PyWren is a new paralle computation engine that drmatically lowers the barrier for scientists to use public cloud for massively parallel worklods, by obviating the need for complex cluster management. This is a joint work between RISELab and Berkeley Center for Computational Imaging.

Ray 0.2 released!

Robert Nishihara

Ray 0.2 has been released: https://ray-project.github.io/2017/09/30/ray-0.2-release.html

Latest RISE Camp Updates

Boban Zarkovich

PLEASE NOTE: tutorials are not Live Streamed – they will be made available on the event page at a later date

RISE Camp 2017 is happening!

Boban Zarkovich

For the latest agenda updates and link to the Live Stream (starting at 9 AM), please go to the event website: https://risecamp.berkeley.edu/.

Second RISELab Newsletter!

Boban Zarkovich

In case you missed it (it went out on August 1, 2017), you can still read a web copy here. If you’d like to be added to our mailing list, please fill out this form.

Low-Latency Model Serving with Clipper

Daniel Crankshaw

The mission of the RISELab is to develop technologies that enable applications to make low-latency decisions on live data with strong security. One of the first steps towards achieving this goal is to study techniques to evaluate machine learning models and quickly render predictions. This missing piece of machine learning infrastructure, the prediction serving system, is critical to delivering real-time and intelligent applications and services. As we studied the prediction-serving problem, two key challenges emerged. The first challenge is supporting the stringent performance demands of interactive serving workloads. As machine learning models improve they are increasingly being applied in business critical settings and user-facing interactive applications. This requires models to render predictions that can meet the strict latency requirements of…

Opaque: Secure Apache Spark SQL

Wenting Zheng

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to any attacker who can observe memory contents. This is a challenging problem because security usually implies a tradeoff between performance and functionality. Cryptographic approaches like fully homomorphic encryption provide full functionality to a system, but are extremely slow. Systems like CryptDB utilize lighter cryptographic primitives to provide a practical database, but are limited in functionality. Recent developments in trusted hardware enclaves (such as Intel SGX) provide a much needed alternative. These hardware enclaves provide hardware-enforced shielded execution that allows…

Announcing Ground v0.1

Vikram Sreekanti

We’re excited to be releasing v0.1 of the Ground project! Ground is a data context service. It is a central repository for all the information surrounding the use of data in an organization. Ground concerns itself with what data an organization has, where that data is, who (both human beings and software systems) is touching that data, and how that data is being modified and described. Above all, Ground aims to be an open-source, vendor neutral system that provides users an unopinionated metamodel and set of APIs that allow them to think about and interact with data context generated in their organization. Ground has many use cases, but we’re focused on two specific ones at present: Data Inventory: large organizations…

RISELab and the 5G Innovators Initiative (5GI2)

Randy Katz

5G, also known as Fifth Generation Mobile Networks, is an emerging global telecommunication system designed for the next generation of significantly higher wireless data bandwidths to support a variety of consumer, commercial, and industrial applications. On promise are data rates of 10-100 mbps for tens of thousands of simultaneous users in the metropolitan area, with 1 gbps indoors and connectivity for hundreds of thousands of simultaneously connected sensors. As important as these enhanced bandwidths will be the software extensibility and configurability of the 5G network, making it possible to partition and customize network bandwidth and services for a variety of site- and area-specific applications to support diverse devices at the network edge. RISELab and our industrial sponsors Ericsson, Intel, and…

Reinforcement Learning brings together RISELab and Berkeley DeepDrive for a joint mini-retreat

Alexey Tumanov

On May 2, RISELab and the Berkeley DeepDrive (BDD) lab held a joint, largely student-driven mini-retreat. The event was aimed at exploring research opportunities at the intersection of the BDD and RISE labs. The topical focus of the mini-retreat was emerging AI applications, such as Reinforcement Learning (RL), and computer systems to support such applications. Trevor Darrell kicked off the event with an introduction to the Berkeley DeepDrive lab, followed by Ion Stoica’s overview of RISE. The event offered a great opportunity for researchers from both labs to exchange ideas about their ongoing research activity and discover points of collaboration. Philipp Moritz started the first student talk session with an update on Ray — a distributed execution framework for emerging…

Shivaram Venkataraman has won the 2016-2017 “Demetri Angelakos Memorial” Achievement Award

Ion Stoica

Shivaram Venkataraman has received the 2016-2017 “Demetri Angelakos Memorial” Achievement Award  who recognizes students that “in addition to conducting research, unselfishly take the time to help colleagues beyond the normal cooperation existing between fellow students“. There is hard to imagine a more deserving recipient than Shivaram. During his PhD, Shivaram has been generous with his time and sharing credit to a fault. He has helped his peers in every imaginable way; after six years his colleagues still have to hear him saying “no” when asked for help. He has been a trusted sounding board for other graduate students (and even faculty) when it comes to feedback on their research, talks, and papers. Shivaram has been, without exaggeration, the nexus of knowledge…

Wenting Zheng is Awarded the 2017-18 IBM PhD Fellowship

Joseph Gonzalez

Wenting Zheng was awarded the prestigious IBM PhD Fellowship for her work on  security and distributed systems. Wenting is actively studying new methods for scalable secure analytics, multi-party computation for machine learning, and distributed zero knowledge proofs.  The IBM Ph.D. fellowship is an “intensely competitive worldwide program that honors exceptional Ph.D. students who have an interest in solving problems that are important to IBM and fundamental to innovation in many academic disciplines and areas of study.” Only 50 fellowships are awarded worldwide annually.

RISELab Announces 3 Open Source Releases

Joe Hellerstein

Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray. Clipper is an open-source prediction-serving system. Clipper simplifies deploying models from a wide range of machine learning frameworks by exposing a common REST interface and automatically ensuring low-latency and high-throughput predictions.  In the 0.1 release, we focused on reliable support for serving models trained in Spark and Scikit-Learn.  In the next release we will be introducing support for TensorFlow and Caffe2 as well as online-personalization and multi-armed bandits.  We are providing active support for early users and will be following Github issues…

Announcing Ray 0.1

Robert Nishihara

https://ray-project.github.io/ray/2017/05/20/announcing-ray.html

Making cities safer: data collection for Vision Zero

K. Shankari

A critical part of enabling cities to implement their Vision Zero policies – the goal of the current National Transportation Data Challenge – is to be able to generate open, multi-modal travel experience data. While existing datasets use police and hospital reports to provide a comprehensive picture of fatalities and life altering injuries, by their nature, they are sparse and resist use for prediction and prioritization. Further, changes to infrastructure to support Vision Zero policies frequently require balancing competing needs from different constituencies – protected bike lanes, dedicated signals and expanded sidewalks all raise concerns that automobile traffic will be severely impacted. A timeline of the El Monte/Marich intersection in Mountain View, from 2014 to 2017 provides an opportunity to…

Declarative Heterogeneity Handling for Datacenter and ML Resources

Alexey Tumanov

Challenge Heterogeneity in datacenter resources has become the fact of life. We identify and categorize a number of different types of heterogeneity. When talking about heterogeneity, we generally refer to static or dynamic attributes associated with individual resources. Previously the levels of heterogeneity were fairly benign and limited to a few different types of processor architectures. Now, however, it has become a common trend to deploy hardware accelerators (e.g., Tesla K40/K80, Google TPU, Intel Xeon PHI) and even FPGAs (e.g., Microsoft Catapult project). Nodes themselves are connected with heterogeneous interconnects, oftentimes with more than one interconnect option available (e.g., 40Gbps ethernet backbone, Infiniband, FPGA torus topology). The workloads we consolidate on top of this diverse hardware differ vastly in their success metrics (completion…

Join RISELab at NSDI’17

Boban Zarkovich

Join RISELab at NSDI’17 on March 27th through the 29th in Boston.  There we will be presenting the Opaque and Clipper systems. Please see the below links to read the final versions of the papers and try pre-release versions of the software and check out slides and videos from their previous talks. Opaque system: The paper is already available here Software is available here Slides and video from a previous talk here Clipper system: The paper is available here Software is available here Slides and video from a previous talk here

RISELab at Spark Summit

Ion Stoica

This year, Spark Summit East was held in Boston between February 7-9. With over 1,500 attendees, this was the largest Spark Summit ever outside the Bay Area. Apache Spark, developed in large at AMPLab (the precursor of RISELab), is now the de-facto standard of big data processing. Like the previous Spark summits, UC Berkeley had a very strong presence. Ion Stoica gave a keynote on RISELab, describing the lab’s research focus on addressing a long-standing grand challenge in computing: enable machines to act autonomously and intelligently, to rapidly and repeatedly take appropriate actions based on information in the world around them. The presentation also discussed some early results from two recent projects, Drizzle and Opaque, which had their own presentations…

Serverless Scientific Computing

Eric Jonas

For many scientific and engineering users, cloud infrastructure remains challenging to use. While many of their use cases are embarrassingly parallel, the challenges involved in provisioning and using stateful cloud services keep them trapped on their laptops or large shared workstations. Before getting started, a new cloud user confronts a bewildering number of choices. First, what instance type do they need ? How do they make the compute/memory tradeoff? How large do they want their cluster to be? Can they take advantage of dynamic market-based instances (spot instances) that can disappear at any time? What if they have 1000 small jobs, each of which takes a few minutes — what’s the most cost-effective way of allocating servers? What host operating…

Metadata Megafail: Messing up Your Data Strategy in 3 Easy Steps

Joe Hellerstein

A key aspect of the RISELab agenda is to aggressively harness data—lots of it, both historical and live. Of course bits in computers don’t provide value on their own. We need a broader context for data: where it came from, what it represents, and how it gets used. Traditionally, people called this metadata: the data about our data. Requirements for metadata have changed drastically in recent years in response to technology trends. There’s an emerging groundswell to address these new requirements and explore new opportunities. This includes our work on the broader notion of data context in the Ground system. How should data-driven organizations respond to these changing requirements?  In the tradition of Berkeley advice like how to build a bad research center and…

RISELab Kicks Off

melissa mecca

Berkeley’s computer science division has an ongoing tradition of 5-year collaborative research labs. In the fall of 2016 we closed out the most recent of the series: the AMPLab. We think it was a pretty big deal, and many agreed. One great thing about Berkeley is the endless supply of energy and ideas that flows through the place — always bringing changes, building on what came before. In that spirit, we’re fired up to announce the Berkeley RISELab, where we will focus intensely for five years on systems that provide Real-time Intelligence with Secure Execution. Context RISELab represents the next chapter in the ongoing story of data-intensive systems at Berkeley; a proactive step to move beyond Big Data analytics into…