Events Archive - RISE Lab

Dissertation Talk: Machine Learning for Query Optimization by Zongheng Yang

David Schonenberg July 22, 2022

Title: Machine Learning for Query Optimization Speaker: Zongheng Yang Advisor: Ion Stoica Date: Wednesday, July 27, 2022 Time: 11:00 am – 12:00 pm PST This is a hybrid event Location (in person): 465H, Soda Hall (inside Sky Computing Lab) Zoom: https://berkeley.zoom.us/j/98364681887?pwd=R3lvdWorNDY4S0h0ZVZDUXhPUklIQT09 Meeting ID: 983 6468 1887 Passcode: 969754 Abstract: In the past two decades, data has been growing at an ever increasing rate, and systems that process data to answer queries have attracted significant attention. Crucial to the performance of data systems is the query optimizer, which translates declarative queries (e.g., SQL) into efficient execution plans. However, the optimization task is highly complex, leading to two key challenges. First, optimizers use a myriad of hand-designed heuristics to tame the complexity, but heuristics leave performance on the table. Second, optimizers…

Dissertation Talk with Melih Elibol: NumS: Scalable Array Programming for the Cloud; 10 AM, Tuesday, May 10

Ivan Ortega May 3, 2022

Title: NumS: Scalable Array Programming for the Cloud Speaker: Melih Elibol Advisors: Michael I. Jordan and Ion Stoica Date: Tuesday, May 10th, 2022 Time: 10:00am – 11:00am PT Location: 380 Soda Hall Zoom Link: https://berkeley.zoom.us/j/7173513117?pwd=aXVjZmVwM2NTbmx4RFVxMUszUHJIZz09 Meeting ID: 717 351 3117 Passcode: 1337 Abstract: Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. Existing solutions that scale arrays and machine learning models provide limited functionality of the scikit-learn library, and achieve sub-optimal performance on numerical operations by relying on dynamic scheduling provided by task-based distributed systems. This can lead to performance problems which are difficult to address without in-depth knowledge of the underlying distributed system. In this thesis, I break down what is required to seamlessly scale the NumPy and scikit-learn APIs…

Security Seminar: Caroline Trippel on “Scalable Assurance via Formal and Verifiable Security Contracts” Fri Apr 22 @ 11AM

Ivan Ortega April 19, 2022

Title: Scalable Assurance via Formal and Verifiable Security Contracts Speaker: Caroline Trippel Zoom link: https://berkeley.zoom.us/j/96448397397?pwd=eEVpbStBQTRWUWZHVDZUU2x0VitMZz09 In-person location: Soda 465H Abstract: The security of modern software ultimately depends on the hardware on which it is run. Thus, it is essential that hardware designers (1) expose security-relevant implementation details to application developers via hardware-software contracts, and (2) ensure (ideally through formal approaches) that said contracts are indeed upheld by fabricated microarchitectures. Unfortunately, microarchitectural attacks—side-channel attacks which leak data processed by programs as a direct result of hardware optimizations—demonstrate a notable deficiency in how existing hardware-software contracts define software visible state. My talk will cover our solution to this shortcoming—axiomatic hardware-software contracts for security (i.e., security contracts), called leakage containment models (LCMs), which support formally reasoning about the security guarantees of programs when they run on particular…

Dissertation Talk with Weikeng Chen: Building Cryptographic Systems from Distributed Trust; 11:00 AM, Friday, April 29

Ivan Ortega April 19, 2022

Title: Building Cryptographic Systems from Distributed Trust Speaker: Weikeng Chen Advisor: Raluca Ada Popa Date: Friday, April 29, 2022 Time: 11am-12pm Location (Zoom): https://us06web.zoom.us/j/84812206165 Meeting ID: 848 1220 6165 Abstract: Centralized systems are prone to data breaches, which can come from both hackers and malicious/compromised employees inside the company. The scale and prevalence of such data breaches raise concerns from users, companies, and governments. In this dissertation, we study how to build systems that distribute such centralized trust among many parties, such that the system remains secure as long as at least one of the parties is honest. We show how distributed trust can be used to secure systems for storage, learning, consensus, and authentication.

Dissertation Talk with Brijen Thananjeyan: Safe Reinforcement Learning Using Learned Safe Sets; 5:00 PM, Tuesday, April 26

Ivan Ortega April 19, 2022

Title: Safe Reinforcement Learning Using Learned Safe Sets Speaker: Brijen Thananjeyan Advisors: Ken Goldberg, Joseph E. Gonzalez Date: Tuesday, April 26, 2022 Time: 5:00 – 6:00 pm PDT This event is hybrid, held in-person and virtually. Location (in person): 8034 Berkeley Way West Location (zoom): https://berkeley.zoom.us/j/94000313493 Meeting ID: 940 0031 3493 Abstract: Reinforcement learning is an increasingly popular framework that enables robots to learn online to perform tasks from prior experience in environments where dynamics or shaped reward functions are challenging to model. However, because this requires robots to sample trajectories under significant dynamical uncertainty, the robot may perform unsafe maneuvers during online exploration, leading to irrecoverable states and possible damage to the robot and/or environment. Safe reinforcement learning is a field with a rich history that studies how…

Dissertation Talk with Alvin Wan: Efficient 2.5D Deep Learning for Capturing and Viewing Reality; 12:00 PM, Thursday, April 28

Ivan Ortega April 16, 2022

Title: Efficient 2.5D Deep Learning for Capturing and Viewing Reality Speaker: Alvin Wan Advisor: Joseph E. Gonzalez Date: Thursday, April 28, 2022 Time: 12:00 – 1:00 PM PST This event is hybrid, held in-person and virtually. Location (in person): 8019 Berkeley Way West Location (zoom): https://berkeley.zoom.us/j/96758340287?pwd=TUpHZmNUUTNkbVcxR1JzdENHU3NEUT09 Abstract: Applications of deep learning in computer vision are growing in large strides, evolving from fun face filters to entire products that fundamentally rely on 3d understanding: namely, products like self-driving cars, virtual reality, and augmented reality. In these new domains, there are two critical requirements for computer vision algorithms: 1) efficiency – the need for real-time, power-efficient, highly-accurate models that run on-device and 2) 3D – processing and generation of the 3D world as viewed from a camera. These multiple concerns yield…

Dissertation Talk with Guanhua Wang: Disruptive Research on Distributed Machine Learning Systems; 11:00 AM, Friday, April 15

Ivan Ortega April 15, 2022

Title: Disruptive Research on Distributed Machine Learning Systems Speaker: Guanhua Wang Advisor: Ion Stoica Date: Friday, April 15, 2022 Time: 11:00am – 12:00pm PT Location (Zoom): https://berkeley.zoom.us/j/98835045978?pwd=ZXNrdW1oaDE5NXhjYTROS1ZvU3lvQT09 Abstract: Deep Neural Networks (DNNs) enable computers to excel across many different applications such as image classification, speech recognition and robotics control. To accelerate DNN training and serving, parallel computing is widely adopted. System efficiency is a big issue when scaling out. In this talk, I will make three arguments towards better system efficiency in distributed DNN training and serving. First, Ring All-Reduce for model synchronization is not optimal, but Blink is. By packing spanning trees rather than forming rings, Blink achieves higher flexibility in arbitrary networking environments and provides near-optimal network throughput.…

Sky Seminar: “The Tip of the Iceberg: How to make ML for Systems work” with Deniz Altınbüken

David Schonenberg March 24, 2022

Join us for our New Sky Seminar Speaker Series! Our first speaker for this series will be Deniz Altınbüken who is a Senior Software Engineer at Google Research. Speaker: Deniz Altınbüken Location: Soda 430-438, Woz Lounge Date: 29th March 2022 Time: 1-2pm PST Title The Tip of the Iceberg: How to make ML for Systems work Abstract Machine Learning has become a powerful tool to improve computer systems and there is a significant amount of research ongoing both in academia and industry to tackle systems problems using ML. Most work focuses on learning patterns and replacing heuristics with these learned patterns to solve systems problems such as compiler optimization, query optimization, failure detection, indexing, and caching. However, solutions that truly improve systems need to maintain the efficiency, availability,…

Security Seminar: Alex Ozdemir on “CirC: Unifying Compilers for SNARKs, SMT, and More”, Friday Feb. 11th at 11 AM PT

Ivan Ortega February 8, 2022

Title: CirC: Unifying Compilers for SNARKs, SMT, and More Speaker: Alex Ozdemir (Stanford) Time: Fri Feb 11 at 11AM Zoom link: https://berkeley.zoom.us/j/98449939001?pwd=R285Z011bXl6Z1ZtNThmYlBzbCtiZz09 In-person location: Soda 465H Abstract: We present CirC (“SIR-see”): a compiler infrastructure that aims to supportzero-knowledge proof systems, multi-party computations, fully homomorphic encryption, constraint solving, and optimization. We observe that these seemingly disparate cryptosystems and verification problems share a common model of computation. This model is characterized by being state-free, non-uniform, and non-deterministic—we call it the *existentially quantified circuit (EQC)*. The common model admits a shared compiler infrastructure (CirC) for compiling different high-level languages to different circuit representations used by these systems. We show: (1) CirC makes it easy to build new compilers for these systems * e.g., we reproduce and improve on…

Dissertation Talk by Johann Schleier-Smith: Understanding and Exploring Serverless Cloud Computing; 10:30 AM, Thursday, December 16

Ivan Ortega December 16, 2021

Title: Understanding and Exploring Serverless Cloud Computing Speaker: Johann Schleier-Smith Advisor: Joseph M. Hellerstein Date: Thursday, December 16, 2021 Time: 10:30 am – 11:30 am PT Location – via Zoom: https://berkeley.zoom.us/j/92620060821?pwd=YW5oOG0yQUFwM1pZN25sTzh5ci9vQT09 Meeting ID: 926 2006 0821 Passcode: 529703 Abstract: The past few years have seen a wave of enthusiasm for serverless computing, and we begin this talk by analyzing the marketplace trends and underlying technical factors that have shaped the movement. We find that serverless computing addresses programming challenges in the same class as those that high-level programming languages address, suggesting that serverless computing may be viewed as high-level programming for distributed systems. We next turn our attention to one of the key shortcomings of serverless: the integration between compute and state. We develop FaaSFS, a distributed file system…

Security Seminar: Semantic Techniques for Information-Flow Languages with Andrew Hirsch, Friday Nov. 5th, 12 PM PDT

Ivan Ortega October 28, 2021 Active

Title: Semantic Techniques for Information-Flow Languages Speaker: Andrew Hirsch Time: Friday Nov 5 at 12PM Zoom link: https://berkeley.zoom.us/j/92366857619?pwd=b1E5UFJRS3JKTnZSK3VMNG9WZW1aQT09 In-person location: Soda 465H Abstract: Information-flow languages enforce information-security policies for any program written in them. The most basic security policy of such languages is noninterference, which states that secret inputs do not affect the observations of an adversary. However, current practices for developing and proving correct information-flow languages rely exclusively on hand-rolled proofs, making exploration of the design space slow and labor intensive. Moreover, proofs are almost never given for implementations of information-flow languages. In this talk, I discuss how semantic techniques can alleviate some of this burden by providing general frameworks for noninterference proofs. In particular, I discuss how the…

RISELab Faculty Talk 10/14/21: The Tale of a Success with Ali Ghodsi

Ivan Ortega September 24, 2021

Title: The Tale of a Success with Ali Ghodsi Speaker: Ali Ghodsi (CEO and Co-Founder of DataBricks and EECS Adjunct Faculty) Date: Thursday, October 14th Time: 7:00pm – 8:00pm PDT RSVP Link: https://www.eventbrite.com/e/the-tale-of-a-success-with-ali-ghodsi-tickets-170763707847 Description: ISAA at Berkeley will be hosting Ali Ghodsi, as the inaugural speaker for “The Tale of a Success” entrepreneurship series. Ali will be sharing his academic and entrepreneurship journey and how he and his co-founders started DataBricks, an AI company that started from UC Berkeley’s AMPLab. More information can be found on the eventbrite page. Bio: Ali Ghodsi is the CEO and co-founder of Databricks, responsible for the growth and international expansion of the company. He previously served as the VP of Engineering and Product Management before taking the role…

Dissertation Talk by Devin Petersohn: Dataframe Systems: Theory, Architecture, and Implementation; 3 PM, Monday, August 9

Kattt Atchley August 4, 2021

Title: Dataframe Systems: Theory, Architecture, and Implementation Speaker: Devin Petersohn Advisor: Anthony Joseph Date: Monday, August 9, 2021 Time: 3:00pm – 4:00pm (Pacific Time) Location (Zoom): https://berkeley.zoom.us/j/97869091687?pwd=SHd3UUl0amdzRjJNY0JTbWpvelQ0dz09 Meeting ID: 97869091687 Abstract: Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in R and Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this thesis, we discuss the implications of signature dataframe features including flexible schemas, ordering, row/column equivalence, and data/meta-data fluidity, as well as the piecemeal, trial-and-error-based approach to interacting with dataframes. While most modern systems aim to scale dataframe workloads by changing properties of dataframes – or by adding new distributed systems knowledge requirements– we believe…

Dissertation Talk: Compartmentalizing Consensus (Michael Whittaker); Thursday, July 29, 2021 12:30 PDT

Kattt Atchley July 22, 2021

Title: Compartmentalizing Consensus Speaker: Michael Whittaker Advisor: Joe Hellerstein Date: Thursday, July 29, 2021 Time: 12:30 – 1:30pm PT Location (Zoom): https://berkeley.zoom.us/j/98850209299 Abstract: State machine replication is at the heart of almost every strongly consistent distributed system. Despite this, it is widely believed that state machine replication protocols like MultiPaxos are too slow, too complicated, and too hard to implement in practice. In this thesis, we take a step towards debunking these myths. We do so with a novel technique called compartmentalization that involves decoupling a protocol into its simplest components and then scaling each component independently. We compartmentalize MultiPaxos and increase its throughput by over 10x. We then apply compartmentalization to a family of complex state machine protocols called…

Security Seminar: Compositional Security for Reentrant Applications with Ethan Cecchetti

David Schonenberg June 7, 2021

Next Friday (June 11) at 11AM, we’ll have a security seminar talk with Ethan Cecchetti, who will be presenting his paper that received a best paper award at IEEE S&P ’21. Title: Compositional Security for Reentrant Applications Speaker: Ethan Cecchetti Time: Friday June 11 at 11AM Zoom link: https://berkeley.zoom.us/j/91018158186?pwd=SldhZnVZR2E3Y1RIZW5GYVZGZlcrUT09 Abstract: The disastrous vulnerabilities in smart contracts sharply remind us of our ignorance: we do not know how to write code that is secure in composition with malicious code. Information flow control has long been proposed as a way to achieve compositional security, offering strong guarantees even when combining software from different trust domains. Unfortunately, this appealing story breaks down in the presence of reentrancy attacks. In this talk I will present a highly general definition…

RISE Seminar 4/30/21: Towards Instance-Optimized Data Systems, a talk by Tim Kraska, MIT

Kattt Atchley April 26, 2021

Title: Towards Instance-Optimized Data Systems Time and Date: Friday April 30th, 2021, 12-1 PM Pacific Zoom Link: https://berkeley.zoom.us/j/99616879594?pwd=anB2NnRGblBQMjRPQ3dJV2hDK3N1Zz09 Abstract: Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithm and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other things. Arguably, the motivation behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for…

Dissertation Talk: Gabe Fierro: Self-Adapting Software for Cyberphysical Systems; Friday, April 30, 3 PM PST

Kattt Atchley April 22, 2021

Title: Self-Adapting Software for Cyberphysical Systems Speaker: Gabe Fierro Advisor: David Culler Date: Friday, April 30, 2021 Time: 3:00 – 4:00pm PT Location (Zoom):https://berkeley.zoom.us/j/96334395146?pwd=di91MHJ2K0lpZHdGK0dBZXIzWGJDQT09 Meeting ID: 963 3439 5146 Zoom Passcode: 900213 Abstract: The built-environment has a metadata problem. The buildings, cities and human-made aspects of our environment produce an incredible amount of data. However, a significant barrier to the development, deployment and wide-scale adoption of data-driven sustainable practices is the effort required to “wrangle” the heterogeneous and unstructured data typical of the built environment into a form that can be used and understood. My research aims to make this critical data easier to collect, manage and analyze. In my talk, I will argue that these issues can be…

Dissertation Talk: Scalable Reinforcement Learning Systems and their Applications (Eric Liang), Wednesday, May 5, 12 PM EST

Kattt Atchley April 20, 2021

Title: Scalable Reinforcement Learning Systems and their Applications Speaker: Eric Liang Advisor: Ion Stoica Date: Wednesday, May 5, 2021 Time: 12:00 – 1:00 pm Pacific Time Location (Zoom): https://berkeley.zoom.us/j/91562514796 Abstract: The past few years have seen the growth of deep reinforcement learning (RL) as a new and powerful optimization technique. This thesis looks at deep RL from the systems perspective in two ways: how to design systems that scale the computationally demanding algorithms used by researchers and practitioners, and conversely, how to apply deep RL to expand the state of the art in systems. The first half of this talk overviews the design and evolution of RLlib, a scalable and widely adopted open source library for distributed reinforcement learning. RLlib offers…

Dissertation Talk: Interaction History for Building Human Data Interfaces by Yifan Wu; Thursday, April 22, 2 PM PST

Kattt Atchley April 16, 2021

Title: Interaction History for Building Human Data Interfaces Speaker: Yifan Wu Advisor: Joe Hellerstein Date: Thursday, April 22, 2021 Time: 2:00 – 3:00pm PT Location (Zoom): https://zoom.us/j/98282759866?pwd=U1RSaEJLbXNDRVg1NWcyUGxRQUt5Zz09 Meeting ID: 982 8275 9866 Zoom Passcode: 804622 Abstract: History provides context for the present. In the same way, past user interactions provide context for present explorations. This thesis investigates ways to reify user interaction history to address emerging challenges in the design and programming of human data interfaces. We leverage interaction history in three different but connected ways. First is to enhance the design of interactions with delays, such as when working with remote databases. We use interaction history as a visual anchor to facilitate concurrent interactions, which ameliorate the cognitive burdens caused by delays. Second is to facilitate…

Dissertation Talk: Expanding the Reach of Fuzz Testing by Caroline Lemieux; Tuesday, April 27, 12 PM PST

Kattt Atchley April 16, 2021

Title: Expanding the Reach of Fuzz Testing Speaker: Caroline Lemieux Advisor: Koushik Sen Date: Tuesday, April 27, 2021 Time: 12:00 – 1:00pm PT Location (Zoom): https://berkeley.zoom.us/j/99666841072?pwd=ZFIyWmpYMzQ2bm04bVVaSm9YdzJVdz09 Meeting ID: 996 6684 1072 Zoom Passcode: 665063 Abstract: Software bugs are pervasive in modern software. As software is integrated into increasingly many aspects of our lives, these bugs have increasingly severe consequences, both from a security (e.g. Cloudbleed, Heartbleed, Shellshock) and cost standpoint. Fuzz testing or simply fuzzing refers to a set of techniques that automatically find bug-triggering inputs by sending many random-looking inputs to the program under test. In this talk, I will discuss how, by identifying core under-generalized components of modern fuzzing algorithms, and building algorithms that generalize or tune these components, I have expanded the application domains…

Dissertation Talk: Usable and Efficient Systems for Machine Learning by Doris Xin; Thursday, April 22nd, 10 AM

Kattt Atchley April 16, 2021

Title: Usable and Efficient Systems for Machine Learning Speaker: Doris Xin Advisor: Aditya Parameswaran Date: Thursday, April 22nd, 2021 Time: 10:00am – 11:00am PST Location: Zoom https://zoom.us/j/97106237545?pwd=NGdtY1d6WGVJYU1PM2R5T3hDZkt6Zz09 Meeting ID: 971 0623 7545 Passcode: 405841 Abstract: Machine learning became a key driver for technological advancement in the last decade thanks to major progress in programming interfaces and scalable systems. Libraries such as Scikit-learn and Keras have made it easier to implement machine learning algorithms and applications, while innovations in distributed systems have enabled model training at an unprecedented scale. However, machine learning tooling is far from perfect today; practitioners still face many challenges developing applications powered by machine learning. This dissertation aims to improve the usability and resource efficiency of systems for developing and productionizing machine learning applications…

RISE Seminar 3/12/21: Resilient and Scalable Architecture for Permissioned Blockchain Fabrics, a talk by Suyash Gupta, UC Davis

Kattt Atchley March 9, 2021

Title: Resilient and Scalable Architecture for Permissioned Blockchain Fabrics Time: 12-1 PM Pacific Time, Friday March 5th, 2021 Zoom Link: https://berkeley.zoom.us/j/93799624421?pwd=ZU01Zm9PQjR0TkJXSXpZSFBhU2Q3QT09 Abstract: Since the introduction of Bitcoin—the first widespread application driven by blockchains—the interest in the design of blockchain-based applications has increased tremendously. At the core of these blockchain applications are consensus protocols that aim at securely replicating a client request among all replicas, even if some replicas are Byzantine faulty. Unfortunately, modern consensus protocols either yield low throughput or face design limitations. In this work, we present the design of three consensus protocols that facilitate efficient consensus among the replicas. Our protocols help to scale consensus through the principles of phase-reduction, parallelization, and geo-scale clustering while ensuring no compromise in fault-tolerance.…

RISE Seminar 3/5/21: Building Storage Systems for New Applications and New Hardware, talk by Vijay Chidambaram, UT Austin

Kattt Atchley March 2, 2021

Title: Building Storage Systems for New Applications and New Hardware Time: 12-1 PM Pacific Time, Friday March 5th, 2021 Video recording link: https://youtu.be/_bdlwwOfKFE Abstract: The modern storage landscape is changing at an exciting rate. New technologies, such as Intel DC Persistent Memory, are being introduced. At the same time, new applications such as blockchain are emerging with new requirements from the storage subsystem. New regulations, such as the General Data Protection Regulation (GDPR), place new constraints on how data may be read and written. As a result, designing storage systems that satisfy these constraints is interesting and challenging. In this talk, I will describe the lessons we learnt from tackling this challenge in various forms: my group has built file systems and concurrent…

RISE Seminar 2/26/21: Vignettes from Applied Research @ Splunk, a talk by Ram Sriharsha

Kattt Atchley February 23, 2021

Title: Vignettes from Applied Research @ Splunk Time: 12-1 PM Pacific Time, Friday February 26th, 2021 YouTube Link: https://youtu.be/8evavcXtKVk Abstract: By picking two very different but hopefully interesting problems among a whole slew that we are solving in the Machine Learning Research group at Splunk, I will try to provide a sense for the types of challenges we face in applying machine learning in our problem domain. The two problem domains I will be focusing on are: monitoring outliers in metrics at scale , and automatically extracting information from machine generated data. Bio: Ram is the head of Machine Learning at Splunk. His group applies and advances the state of the art in machine learning in areas relevant to Splunk. They also develop the machine learning…

RISELab Poster Session / BEARS Symposium – Thursday February 11, 2021

David Schonenberg February 9, 2021

We are pleased to announce that RISELab will hold a virtual poster session as part of the BEARS 2021 Research Symposium on February 11, 2021 from 1:00-2:30pm PST. This virtual event will highlight the exciting projects that RISELab’s researchers have been working on in recent months. You will have the opportunity to view posters and interact virtually with the graduate and undergraduate poster presenters, RISELab faculty, and other event guests. Remember, you must RSVP separately for both events: Register for BEARS here. (Select UCB Student, Employee or Industrial Affiliate for complimentary registration.) RSVP for the RISELab poster session here. A RISELab Poster Session event link will be emailed to everyone on our RSVP list on the day of the event. Please RSVP by February 10. Questions? Email us…

RISE Seminar 11/20/20: Converged Analytics and Cloud-Native Architectures, a talk by Raghu Ramakrishnan of Microsoft

Kattt Atchley November 18, 2020

Title: Converged Analytics and Cloud-Native Architectures Time: 12-1 PM Pacific Time, Friday November 20th, 2020 Talk recording: https://www.youtube.com/watch?v=bzjlYEZMdY0&feature=youtu.be Abstract: Digital transformation is a popular phrase these days. In essence, the idea is gather all relevant data (including real-time telemetry, operational data, historical data, social-media streams, and more) to understand any facet of an enterprise that is of interest, and to use the resulting insights to implement changes that optimize the desired outcomes. The idea is not new—cavemen no doubt observed that outrunning companions mattered more than outrunning sabertooth tigers and chose companions accordingly—and has motivated relational warehouses, data marts and data lakes. The underlying technology, however, has been evolving over the past decade. From big data to cloud, we’ve seen factors sparking big changes. As data grew in…

RISE Seminar 11/6/20: The µ Operating System: Linux for µs-latency and Tbps-bandwidth (Prof. Rachit Agarwal, Cornell)

Kattt Atchley November 2, 2020

Title: The µ Operating System: Linux for µs-latency and Tbps-bandwidth Time: 12-1 PM PST, Friday November 6th, 2020 In accordance with presenter’s wishes, no recording of this talk is available Abstract: Within the next few years, data centers will have hardware that makes it possible to achieve <10µs latency and 1.4Tbps bandwidth between servers. Is it possible to re-architect Linux to capitalize on the benefits offered by such high-performance hardware? In this talk, I will discuss how my group has been having fun exploring this question, what we have learned so far, and what it might take our community to achieve this goal. Bio: Rachit is an assistant professor at Cornell University, where he works with an awesome group of students and postdocs — Saksham Agarwal, Qizhe Cai, Midhul…

RISE Seminar 10/16/20: Streamlet: Textbook Streamlined Blockchains, a talk by Prof. Elaine Shi of Cornell

Kattt Atchley October 14, 2020

Time and Date: 12-1 PM PDT, Friday October 16th, 2020 Talk recording: https://www.youtube.com/watch?v=vU0veNETF8s Title: Streamlet: Textbook Streamlined Blockchains Abstract: Numerous works in the past have focused on constructing simple and understandable distributed consensus protocols. In this talk, I will present an absurdly simple consensus protocol called Streamlet. The entire protocol is: every epoch, a leader proposes a block extending the longest chain it has seen so far. Everyone votes for (i.e., signs) the first block proposed by the leader if it extends from one of the longest notarized chains they have seen so far. When a block collects votes from 2/3 of the nodes, it becomes notarized. Notarized does not mean final. Finality is decided with the following rule: for any chain in…

RISE Seminar 10/9/20: End-to-End Acceleration of Machine Learning Pipelines, a talk by Andy Feng of Nvidia

Kattt Atchley October 7, 2020

Title: End-to-End Acceleration of Machine Learning Pipelines Time and Date: October 9th 2020, 12-1 PM Pacific Time Talk recording: https://youtu.be/Q6ayd76RFO0 Abstract: As the amount of data in the world has increased exponentially, there is huge interest across industries to build machine learning pipelines to mine the hidden gold. While CPU clock speed has largely remained the same in the last decade, GPUs have demonstrated great potential to accelerate machine learning pipelines dramatically. In this talk, we will present Nvidia’s initiatives to enable end-to-end acceleration of machine learning pipelines: data annotations, data analytics, model training and model inference. We will cover recent successes in healthcare and Apache Spark, including a Covid model trained across 20 hospitals. Related challenges will be discussed…

RISE Seminar 10/2/20: Compiler 2.0: Using Machine Learning to Modernize Compiler Technology, a talk by Saman Amarasinghe of MIT

Kattt Atchley October 1, 2020

Title: Compiler 2.0: Using Machine Learning to Modernize Compiler Technology Time and Date: 12-1 PM Pacific, Friday October 2nd, 2020 Talk recording: https://www.youtube.com/watch?v=ICQiirenuLc&list=PLTPaZLQlNIHoxdX5qQ7QTiMIQu2lhY5MF&index=3 Abstract: Modern compilers are still built using technology that existed decades ago. These include basic algorithms and techniques for lexing, parsing, data-flow analysis, data dependence analysis, vectorization, register allocation, instruction selection, and instruction scheduling. It is high time that we modernize our compiler toolchain. In this talk, I will show the path to the modernization of one important compiler technique — vectorization. Vectorization was first introduced in the era of Cray vector processors during the 1980’s. In modernizing vectorization, I will first show how to use new techniques that better target modern hardware. While vector supercomputers need large vectors,…

RISE Seminar 9/25/20: Relational Knowledge Graphs as the Foundation for Artificial Intelligence, a talk by Molham Aref of RelationalAI

Kattt Atchley September 21, 2020

Time: Friday, September 25th 2020, 12-1 PM Talk recording: https://www.youtube.com/watch?v=VpyGbjUzG7Y&feature=youtu.be Talk: Relational Knowledge Graphs as the Foundation for Artificial Intelligence Abstract: In this talk, I will describe the motivation and architecture of a new kind of cloud-native relational database for knowledge graphs. I will motivate our belief that future enterprise systems will be built using relational knowledge graphs as a foundation. In such systems, each component or service will either be (1) learned with machine learning or (2) declared and executed via a sophisticated reasoner. After a brief overview of knowledge graphs and the relational paradigm I will describe the foundational design principles and implementation philosophy of the RelationalAI system and how it supports workloads that mix data management, business intelligence, machine learning, reasoning, and graph analytics.…

RISE Seminar 9/18/20: Algorithmic foundations of neural architecture search, a talk by Ameet Talwalkar of CMU and Determined AI

Kattt Atchley September 14, 2020

Talk title: Algorithmic foundations of neural architecture search Date & Time: Friday 9/18, 12-1 PM Pacific Time Talk recording: https://www.youtube.com/watch?v=jSxM6aAUYqM&feature=youtu.be Abstract: Neural architecture search (NAS)—the problem of selecting which neural model to use for your learning problem—is a promising direction for automating and democratizing machine learning. Early NAS methods achieved impressive results on canonical image classification and language modeling problems, yet these methods were massively expensive computationally. More recent heuristics relying on weight-sharing and gradient-based optimization are drastically more computationally efficient while also achieving state-of-the-art performance. However, these heuristics are also complex and are poorly understood. In this talk, we introduce the NAS problem and then present our work studying recent NAS heuristics from first principles. We first perform an extensive…

RISE Seminar 5/14/20: Learning to Solve Combinatorial Optimization Problems with Applications to Systems and Chip Design, a talk by Azalia Mirhoseini

David Schonenberg May 13, 2020

This week, we are very excited to host Azalia Mirhoseini from Google Brain. Azalia will tell us about reinforcement learning in systems and chip design. Speaker: Azalia Mirhoseini (Google Brain) Title: Learning to Solve Combinatorial Optimization Problems with Applications to Systems and Chip Design Date & Time: Thursday, May 14 2020, 4-5pm Abstract: In the past decade, computer systems and chips have played a key role in the success of AI. Our vision is to use AI to transform the way systems and chips are designed. Many core problems in systems and hardware design are combinatorial optimization or decision making tasks with state and action sizes that are orders of magnitude larger than common AI benchmarks in robotics and games. In this talk, I will go over some of our research on tackling such optimization problems. First, I will…

RISE Seminar 4/17/20: Lessons from Large-Scale Cloud Software at Databricks a talk by Matei Zaharia

David Schonenberg April 13, 2020

This week, we are very excited to host Matei Zaharia from Stanford. Speaker: Matei Zaharia (Stanford University) Title: Lessons from Large-Scale Cloud Software at Databricks Date & Time: Friday, April 17 2020,12-1pm Zoom link: https://berkeley.zoom.us/j/540757582 Abstract: The cloud has become the most common way to deliver commercial software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software…

RISE Seminar 2/21/20: Towards an Equitable and Trustworthy Data Economy, a talk by Ruoxi Jia, UC Berkeley

Kattt Atchley February 18, 2020

Title: Towards an Equitable and Trustworthy Data Economy Speaker: Ruoxi Jia Date and location: Friday, February 21 2020, 12 – 1pm, Wozniak Lounge Abstract: The data economy is a rising ecosystem in which data are produced, distributed, and consumed at an unprecedented scale. On the one hand, the current data economy creates new levels of prosperity by driving rapid advances in machine learning and automation. On the other hand, it has some fundamental challenges that need to be addressed. First and foremost, how much is data worth? Data is valuable, yet a principled data valuation method is lacking. The answer to this question has profound implications: it will open up new data sources by facilitating and incentivizing data sharing and reduce economic…

RISELab Open House and Poster Session

David Schonenberg January 30, 2020

This year’s BEARS Symposium will be held on Thursday 2/13/20 at the International House in Berkeley from 9am-5pm. https://eecs.berkeley.edu/research/bears/2020 RISELab will host an Open House and Poster session from 1pm-3pm in the lab. Please join us for some engaging discussions over light refreshments. RISELab is located in 465 Soda Hall, on the corner of Hearst and LeRoy Avenues in Berkeley. See you there!

RISE Seminar 1/31/20: Optimal Resource Allocation for Parallelizable Jobs, a talk by Ben Berg

Kattt Atchley January 27, 2020

Title: Optimal Resource Allocation for Parallelizable Jobs Abstract: Modern distributed computation frameworks have enabled the dynamic allocation of resources to parallelizable jobs. When a job is parallelized across many servers, it will complete more quickly. However, jobs typically receive diminishing returns from being allocated additional servers. Hence, given a fixed number of servers, it is not obvious how to allocate servers across a set of jobs in order to minimize the overall mean response time. A good allocation policy should favor shorter jobs, but favoring any single job too heavily can cause the system to operate very inefficiently. We derive the optimal allocation policy which minimizes mean response time across a set of jobs by balancing the trade-off between granting…

RISE Seminar 12/6/19: The Pit and the Pendulum, a talk by Lorenzo Alvisi

David Schonenberg December 4, 2019

Title: The Pit and the Pendulum Speaker: Lorenzo Alvisi (Cornell) Date and location: Friday, December 6, 11 – 12 pm, Wozniak Lounge Abstract: Since the elegant foundations of transaction processing were established in the mid 70’s with the notion of serializability and the codification of the ACID (Atomicity, Consistency, Isolation, Durability) paradigm, performance has not been considered one of ACID’s strong suits, especially for distributed data stores. Indeed, the NoSQL/BASE movement of the last decade was born out of frustration with the limited scalability of traditional ACID solutions, only to become itself a source of frustration once the challenges of programming applications in this new paradigm began to sink in. But how fundamental is this dichotomy between performance and ease of programming? In this…

RISE Seminar 11/22/19: Networked Systems in the Era of Programmable Dataplanes, a talk by Arvind Krishnamurthy

Kattt Atchley November 19, 2019

Title: Networked Systems in the Era of Programmable Dataplanes Speaker: Arvind Krishnamurthy (University of Washington) Date and location: Friday, November 22, 11 – 12 pm, Wozniak Lounge Abstract: Emerging networking architectures are allowing for flexible and reconfigurable packet processing at line rate both on the switch and the NIC. Despite their promising new functionality, programmable switches and NICs are not all-powerful; they have limited state, support limited types of operations, and limit per-packet computation to operate at line rate. In this talk, I will describe how to mask resource limitations using approximation techniques and new scheduling algorithms and how to build a general framework for exposing in-network computing capability to distributed applications. In addition to presenting case studies of optimizing networked systems,…

RISE Seminar 11/15/19: Shrinking the Attack Surface for Expressive Trusted Hardware, a talk by James Mickens

David Schonenberg November 12, 2019

Title: Shrinking the Attack Surface for Expressive Trusted Hardware Speaker: James Mickens (Harvard) Date and location: Friday, November 15, 11 – 12 pm, Wozniak Lounge Abstract: Trusted hardware attempts to provide software with silicon-guaranteed security, for some definition of “security.” Unfortunately, modern trusted hardware is either too simple to provide rich notions of security (see TPM chips), or is sufficiently complex that the secure hardware itself is vulnerable to microarchitectural exploits (see SGX and TrustZone). In this talk, I will describe some of these troubling aspects of the human condition. I will then describe some of my research into making these problems less problematic. The basic idea is to run application code on a traditional out-of-order, speculative pipeline, while running a…

RISE Seminar 10/25/19: RADE : Resource-Efficient Supervised Anomaly Detection Using Decision Tree Based Ensemble Methods, a talk by Yaniv Ben-Itzhak (VMware)

Kattt Atchley October 23, 2019

Title: RADE : Resource-Efficient Supervised Anomaly Detection Using Decision Tree Based Ensemble Methods Speaker: Yaniv Ben-Itzhak (VMware) Date and location: Friday, October 25, 11 – 12 pm, Wozniak Lounge Abstract: Decision-tree-based ensemble classification methods (DTEMs) are a prevalent tool for supervised anomaly detection. However, due to the continued growth of datasets, DTEMs result in increasing drawbacks such as growing memory footprints, longer training times, and slower classification times at lower throughput. In this paper, we present, design, and evaluate RADE – a DTEM-based anomaly detection framework that augments standard DTEM classifiers and alleviates these drawbacks by relying on two observations: (1) we find that a small (coarse-grained) DTEM model is sufficient to classify the majority of the classification queries correctly, such that a classification…

RISE Seminar 10/4/19: “ML will change the world – what’s taking it so long?” – a talk by Rajat Monga

David Schonenberg September 30, 2019

Title: ML will change the world – what’s taking it so long? Speaker: Rajat Monga (Google) Date and location: Friday, October 4, 11 – 12 pm, Wozniak Lounge Abstract: Over the last few years we have seen papers claiming image recognition, speech recognition and more recently natural language understanding that is better than humans. With such amazing results, everything around us should be way smarter than it really is. Why on earth is it taking so long to make things better? Do we need better algorithms – are deep learning, reinforcement learning, GANs not good enough yet? More data – maybe Imagenet was not big enough? More compute – bigger Pods with TPUs and GPUs – to exaflops and beyond?…

RISE Seminar 9/27/19: If you want to be rich, get a lot of money: Theory and Systems for Weak Supervision, a talk by Christopher Ré

David Schonenberg September 24, 2019

Title: If you want to be rich, get a lot of money: Theory and Systems for Weak Supervision. Speaker: Christoper Ré (Stanford) Date and location: Friday, September 27, 11 – 12 pm, Wozniak Lounge Abstract: If you want to build a high-quality machine learning product, build a large, high-quality training set. At first glance, this seems as useful as the statement “if you want to be rich, get a lot of money.” However, a key idea driving our work is that new theoretical and systems concepts including weak supervision, automatic data augmentation policies, and more, can enable engineers to build training sets more quickly and cost effectively. Along with state-of-the-art results on benchmarks, these concepts have allowed our group and collaborators to build…

RISE Seminar 9/20/19: MI6: Secure Enclaves in a Speculative Out-of-Order Processor, a talk by Srini Devadas

David Schonenberg September 16, 2019

Title: MI6: Secure Enclaves in a Speculative Out-of-Order Processor Speaker: Srini Devadas (MIT) Date and location: Friday, September 20, 11 – 12 pm, Wozniak Lounge Abstract: Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees. We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attacker capable of mounting any software attack currently considered practical, including control flow speculation attacks. MI6 is inspired by Sanctum and extends its isolation guarantee to more realistic memory hierarchies. We model the performance impact of enclaves in MI6 through FPGA emulation on AWS F1…

RISE Seminar 9/13/19: Scalable, Efficient, and Productive: Holistic Hardware Optimizations for Machine Learning Acceleration, a talk by Sophia Shao

David Schonenberg September 12, 2019

Title: Scalable, Efficient, and Productive: Holistic Hardware Optimizations for Machine Learning Acceleration Speaker: Sophia Shao (UC Berkeley) Date and location: Friday, September 13, 11 – 12 pm, Wozniak Lounge Abstract: Machine learning systems are being widely deployed across billions of edge devices and datacenter across the world. At the same time, in the absence of Moore’s Law and Dennard scaling, we rely on building vertically integrated systems with domain-specific accelerators to improve the system performance and efficiency. In this talk, I will describe our recent work on building scalable and efficient hardware that delivers real-time and robust performance across diverse deployment scenarios through joint hardware-software optimizations. I will conclude my talk by describing ongoing efforts toward building next-generation computing platforms for real-time machine…

Roxana Geambasu talk: Security and Privacy Guarantees in Machine Learning with Differential Privacy

David Schonenberg July 26, 2019

Join us on August 13th from 12:00-1:00 PM in 405 Soda Hall for a talk by Associate Professor of Computer Science at Columbia University, Roxana Geambasu.

SURF Poster Session

David Schonenberg July 26, 2019

Come by RISELab on Friday 8/9/19 to see what our SURFers (Summer Undergraduate Research Fellows) have been up to. Enjoy some light afternoon snacks and boba teas in the lab.

RISE Seminar: Search-based Approaches to Optimize Deep Learning Computation, a talk by Zhihao Jia

David Schonenberg May 13, 2019

Title: Search-based Approaches to Optimize Deep Learning Computation Speaker: Zhihao Jia Affiliation: Stanford University Date and location: Friday, May 17, 12:30-1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Existing deep learning frameworks deploy DNN architectures on modern hardware devices by applying a sequence of heuristic optimizations. For example, current frameworks use data and model parallelism to parallelize DNN training across distributed clusters, and use rule-based operator fusions to optimize DNN computation graphs. These heuristic approaches achieve improved runtime performance in general but miss subtle optimizations for particular DNN architectures. In this talk, I will present two projects that use search-based approaches to optimize deep learning computation. First, I will introduce SOAP, a comprehensive search space of parallelization strategies for DNN training that…

RISE Seminar: Meeting stringent Internet performance requirements despite uncertainty, a talk by Sanjay Rao

David Schonenberg May 10, 2019

Title: Meeting stringent Internet performance requirements despite uncertainty Speaker: Sanjay Rao Affiliation: Purdue University Date and location: Friday, May 10, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: The Internet is the backbone of critical global cyber infrastructure, and its continual phenomenal growth comes with increasingly stringent expectations on performance. For instance, a recent paper from Google indicates that traffic on their wide-area network infrastructure has grown by 100X in the last 5 years, while performance requirements must now be met 99.99% rather than 99% of the time. High application quality (e.g., low latency Web applications, high video quality) is critical to user engagement, and new demanding applications such as 4K and 360-degree video continue to emerge. The performance requirements must be met despite…

RISE Seminar 4/26/19: Designing Systems for Push-Button Verification, a talk by Xi Wang

Kattt Atchley May 1, 2019

Title: Designing Systems for Push-Button Verification Speaker: Xi Wang Affiliation: University of Washington Date and location: Friday, April 26, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: I will give an overview of our research projects on designing systems for highly automated formal verification: Hyperkernel (SOSP’17), Nickel (OSDI’18), and more recent work on RISC-V security monitors. Formal verification is effective in eliminating entire classes of bugs, but verifying systems software is a major undertaking and often requires years of expert work. To free developers from such a proof burden, our key observation is that by co-designing systems and verifiers, we can use SMT solvers such as Z3 to achieve fully automated verification of those systems in a push-button style. Our results show that it…

RISE Seminar 5/3/19: Democratizing Video Analytics – The quest for the holy trinity of low latency, low cost, and high accuracy, by Ganesh Ananthanarayanan

David Schonenberg April 29, 2019

Title: Democratizing Video Analytics – The quest for the holy trinity of low latency, low cost, and high accuracy Speaker: Ganesh Ananthanarayanan Affiliation: Microsoft Research Date and location: Friday, May 3, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Video cameras are pervasively deployed for security and smart city scenarios – there is a camera for every eight people in the US! Large-scale video processing is a grand challenge of systems that enable AI. This talk will describe the recent technical advancements of Project Rocket, that takes up this challenge to democratize video analytics: enable anyone with a camera to benefit from video analytics. The first half of the talk will focus on resource management of video processing pipelines. We…

RISE Seminar 4/19/19: From Queues to Earliest Departure Time, a talk by David Wetherall

David Schonenberg April 16, 2019

Title: From Queues to Earliest Departure Time Speaker: David Wetherall Affiliation: Google Data and location: Friday, April 19, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Queues are used heavily as the primitive in host networking stacks and NICs to gate the transmission of packets into the network. They specify time implicitly, draining packets into the network “as fast as possible”. In this talk I will explain why we have come to favor a model for using the network in which packet departure times are set explicitly: it is a fit for the modern networking environment in which hosts cannot rely on deep-buffered switches to absorb large bursts, plus more efficient when implemented with constructs such as timing wheels, and more flexible for expressing…

RISE Seminar 4/12/19: Efficient Data Processing on Modern Hardware, a talk by Sebastian Breß

David Schonenberg April 10, 2019

Title: Efficient Data Processing on Modern Hardware Speaker: Sebastian Breß Affiliation: TU Berlin Date and location: Friday, April 12, 12:30-1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process. In this talk, we provide an overview of our research on automatic performance tuning in Hawk, a hardware-tailored code generator [1]. Our key idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations. We…

RISE Seminar 4/5/19: AI Ethics for Systems (and other technical) Researchers, a talk by Sarah Bird

David Schonenberg April 1, 2019

Title: AI Ethics for Systems (and other technical) Researchers Speaker: Sarah Bird Affiliation: Microsoft Date and location: Friday, April 5, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learning in many current and future real-world applications. Now there are calls from across the industry (academia, government, and industry leaders) for technology creators to ensure that AI is used only in ways that benefit people and “to engineer responsibility into the very fabric of the technology.” This of course has significant philosophical and political challenges and tradeoffs, and much of the conversation has been centered around these issues. However, while we are working on…

RISE Seminar 3/28/19: Towards Learned Algorithms, Data Structures, and Systems by Tim Kraska

David Schonenberg March 19, 2019

Title: Towards Learned Algorithms, Data Structures, and Systems Speaker: Tim Kraska Affiliation: MIT Abstract: All systems and applications are composed from basic data structures and algorithms, such as index structures, priority queues, and sorting algorithms. Most of these primitives have been around since the early beginnings of computer science (CS) and form the basis of every CS intro lecture. Yet, we might soon face an inflection point: recent results show that machine learning has the potential to significantly alter the way those primitives are implemented and the performance they can provide. In this talk, I will outline different ways to build learned algorithms and data structures to achieve “instance-optimality” with a particular focus on techniques used as part of data…

RISE Seminar 3/15/19: The Cloud Model – Enabling Provable Security at Scale by Neha Rungta

David Schonenberg March 11, 2019

Title: The Cloud Model – Enabling Provable Security at Scale Speaker: Neha Rungta Affiliation: Amazon Data and location: Friday, March 15, 12:30 – 1:30pm; Wozniak Lounge (430 Soda Hall) Abstract: Cloud computing is the on-demand delivery of IT resources through a common services platform. Businesses of all sizes are migrating their IT infrastructure to the cloud, both to reduce costs and increase agility for deploying new services and features. The cloud provides a precise model of resource configuration, communication, coordination, and storage. It enables an ease of formalization and the verification technologies built for this model are applicable to millions of customers. The programming model for cloud computing is similar in many ways to approaches seen in traditional model-driven design. The cloud model can be…

RISE Seminar: Three Modeling Vignettes from Search Ads Quality at Google by Sugato Basu

David Schonenberg March 4, 2019

Title: Three Modeling Vignettes from Search Ads Quality at Google Speaker: Sugato Basu Affiliation: Google Date and location: Friday, March 8, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Building and deploying machine-learning (ML) models at Google comes with interesting challenges. For example, some models have to handle massive amounts of training data, while some supervised tasks have insufficient amount of training labels. Or, even when the model quality is good enough for a product requirement, it may not meet other requirements (e.g., serving latency, memory footprint). In this talk we will discuss some of these challenges and share our experiences from deploying ML models for quality improvements in Search Ads products via some case studies. The first case study will discuss transfer…

RISE Seminar: Get Your Data Together! Algorithms for Managing Data Lakes by Erkang Zhu

David Schonenberg February 26, 2019

Title: Get Your Data Together! Algorithms for Managing Data Lakes Speaker: Erkang (Eric) Zhu Affiliation: University of Toronto Date and location: Friday, March 1, 12:30 – 1:30 pm, Wozniak Lounge (430 Soda Hall) Abstract: Data lakes (e.g., enterprise data catalogs and Open Data portals) are data dumps if users cannot find and utilize the data in them. In this talk, I present two problems in massive, dynamic data lakes: 1) searching for joinable tables to discover potential linkages, and 2) joining tables from different sources through auto-generated syntactic transformation on join values. I will also present two algorithmic solutions that can be used for data lakes that are large both in the number of tables (millions) and table sizes. The presented work has been published…

RISE Seminar 2/8/19: Data-Driven Datasets: Deep Active Learning for Autonomous Vehicles and Beyond, a talk by Adam Lesnikowski

Kattt Atchley February 14, 2019

Note: this talk has been recorded; you can watch the video on RISELab YouTube channel Title: Data-Driven Datasets: Deep Active Learning for Autonomous Vehicles and Beyond Speaker: Adam Lesnikowski Affiliation: NVIDIA Date and location: Friday, February 8, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Data is the source code of the software 2.0 paradigm. So why has there been a tremendous amount of focus on neural network architectures and relatively little on dataset construction in the development of modern machine learning? The speaker believes that this focus is misplaced, with the largest future gains in data-driven machine learning systems for computer vision and other applications coming from improved data set building strategies rather than architecture improvements. In particular, employing feedback from…

RISE Seminar: 2/15/19 Building the warehouse scale computer, a talk by John Wilkes

David Schonenberg February 14, 2019

Title: Building the warehouse scale computer Speaker: John Wilkes Affiliation: Google Date and location: Friday, February 15, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Imagine some product team inside Google wants 100,000 CPU cores + RAM + flash + accelerators + disk in a couple of months. We need to decide where to put them, when; whether to deploy new machines, or re-purpose/reconfigure old ones; ensure we have enough power, cooling, networking, physical racks, data centers and (over longer a time-frame) wind power; cope with variances in delivery times from supply logistics hiccups; do multi-year cost-optimal placement+decisions in the face of literally thousands of different machine configurations; keep track of parts; schedule repairs, upgrades, and installations; and generally make all this happen…

RISE Seminar: 2/1/19 Machine learning for medical decision support, a talk by Pengtao Xie

Kattt Atchley February 1, 2019

Title: Machine learning for medical decision support Speaker: Pengtao Xie Affiliation: Peetum Date and location: Friday, February 1, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: With the rapid growth of electronic health records and the advancement of machine learning technologies, needs for AI-enabled clinical decision-making support is emerging. In this talk, I will present some recent work toward these needs, where an integrative system that distills insights from large-scale and heterogeneous patient data, as well as learns and integrates medical knowledge from broader sources such as the literatures and domain experts, and empowers medical professionals to make accurate and efficient decisions within the clinical flow, was built. In particular, I will discuss two aspects of practical clinical decision-support — automatic generation of…

BEARS Symposium and RISELab Poster Session

David Schonenberg January 29, 2019

Join us for the RISELab Open House Poster Session after the BEARS Symposium on February 14, 2019 at UC Berkeley. The theme of this year’s symposium is The Future of Medicine: An EECS Perspective. The Berkeley EECS Annual Research Symposium (BEARS), will be held at the Sibley Auditorium on the Berkeley campus on from 9:00 AM to 12:00 PM followed by the RISELab Open House Poster Session from 1:00 PM to 3:00 PM at Soda Hall’s 5th Floor Atrium. The poster session format will be conducted as an open house in which researchers will display their posters and be on hand to discuss their research. IMPORTANT: Separate registrations are required for both events. Register for BEARS at the link above. For RISELab’s Poster Session, please RSVP here – this is a separate from the symposium. We hope you’ll join us! Let us know if you need any…

RISE Seminar: 11/30/18 Data in the Cloud, a talk by Raghu Ramakrishnan

David Schonenberg November 28, 2018

Title: Data in the Cloud Speaker: Raghu Ramakrishnan Affiliation: Microsoft Date and location: Friday, November 30, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: The cloud has forced a rethinking of database architectures. Does this offer an opportunity to address the siloed nature of data management systems? In this talk, I’ll discuss these issues through the lens of the Microsoft data journey, both internal and external. Bio: Raghu Ramakrishnan is CTO for Data and a Technical Fellow at Microsoft, and leads the CISL and GSL applied research teams. He has previously served as Chief Scientist at Yahoo! and Professor at the University of Wisconsin-Madison, in addition to…

CANCELLED: RISE Seminar: 11/16/18 Diversity-promoting and large-scale machine learning for healthcare, a talk by Pengtao Xie

Kattt Atchley November 16, 2018

Due to poor air quality in Berkeley, this event has been cancelled. We will try to reschedule it in the near future. Apologies for the inconvenience! Title: Diversity-promoting and large-scale machine learning for healthcare Speaker: Pengtao Xie Affiliation: Petuum

RISE Seminar: 11/16/18 Diversity-promoting and large-scale machine learning for healthcare, a talk by Pengtao Xie

David Schonenberg November 14, 2018

Title: Diversity-promoting and large-scale machine learning for healthcare Speaker: Pengtao Xie Affiliation: Petuum Date and location: Friday, November 16, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: In healthcare, a tsunami of medical data has emerged, including electronic health records, images, literature, etc. These data can be heterogeneous and noisy, which renders clinical decision-making time-consuming, error-prone and suboptimal. In this thesis, we develop machine learning (ML) models and systems for distilling high-value patterns from unstructured clinical data and making informed and real-time medical predictions and recommendations, to aid physicians in improving the efficiency of workflow and quality of patient care. When developing these models, we encounter several…

RISE Seminar: 11/2/18 Managing Compute at Google Scale, a talk by Steve Hand

David Schonenberg October 29, 2018

Title: Managing Compute at Google Scale Speaker: Steven Hand Affiliation: Google Date and location: Friday, November 2, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Google deploys and operates a huge amount of computing capacity around the globe. In this talk I will provide an overview of Borg, the cluster management system used coordinate this work, discuss the challenges introduced due to new hardware and software systems, and look to future needs and capabilities. Bio: Steve Hand is a Software Engineer at Google where he works on Borg. Prior to joining Google in 2015 he was a Principal Researcher at Microsoft Research Silicon Valley Lab, and prior…

RISE Seminar 10/26/18: Abstractions for the Stateful Control Plane, a talk by Mahesh Balakrishnan

David Schonenberg October 23, 2018

Title: Abstractions for the Stateful Control Plane Speaker: Mahesh Balakrishnan Affiliation: Facebook/Yale University Date and location: Friday, October 26, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: At the heart of many cloud-scale systems is a logically centralized control plane that requires strong consistency and fault tolerance: examples include coordination services, SDN controllers, filesystem namespaces, and big data schedulers. Today, these control plane services are difficult to build, harden and scale, requiring complex protocols like Paxos and 2-phase commit that are inefficient when layered and difficult to combine. The shared log approach simplifies such applications by providing a data-centric abstraction that hides the complexity of the underlying distributed system. First generation shared log systems achieve…

RISE Seminar 10/19/18: Modeling (Human) Bias in Artificial Intelligence, a talk by Margaret Mitchell

David Schonenberg October 16, 2018

Title: Modeling (Human) Bias in Artificial Intelligence Speaker: Margaret Mitchell Affiliation: Google Date and location: Friday, October 19, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Human data brings with it human biases. Algorithms trained on that data can effectively perpetuate and amplify these biases, creating feedback loops that deepen social division. In this talk, I walk through how human bias is at play in the end-to-end machine learning cycle, and the effects this can have within society. Bio: Margaret is a Senior Research Scientist and leads the Ethical AI team within Google Research. Her research is interdisciplinary, combining computer vision, natural language processing, statistical methods, deep learning, and cognitive science; and…

RISE Camp 2018 is happening!

Kattt Atchley October 11, 2018

You can watch it live here. For more information, please go to the event website.

RISE Talk 10/8/18: Fast, Efficient, and Complexity-overcoming; the needs for the next generation of the data analytic platform, a talk by Edmon Begoli, ORNL

David Schonenberg October 8, 2018

Title: Fast, Efficient, and Complexity-overcoming — the needs for the next generation of the data analytic platform Speaker: Edmon Begoli Location: 405 Soda Time: Monday, Oct. 8, 12 PM noon Abstract: Over the past ten years, we focused on the formulation of the platforms for large-scale, general-purpose data analytics that can bridge the gap between the traditional, business enterprise data systems, and the new ones capable of processing data originating from the globally generated sources (web users, devices, etc.) While many sophisticated solutions emerged, and many problems got to be solved, great many challenges related to the inherent complexity, origins, and structure of the data, its privacy, and the efficient derivation of insights from the data still remain. Furthermore, there are…

RISE Seminar 10/5/18: Building a broad knowledge graph for products, a talk by Xin Luna Dong

David Schonenberg October 2, 2018

Title: Building a broad knowledge graph for products Speaker: Xin Luna Dong Affiliation: Amazon Date and location: Friday, October 5, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the…

RISE Seminar 9/21/18: Remote Memory for Non-Masochists, a talk by Marcos Aguilera

David Schonenberg September 17, 2018

Title: Remote Memory for Non-Masochists Speaker: Marcos Aguilera Affiliation: VMWare Research Date and location: Friday, September 21, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Remote memory is an old idea that has recently re-emerged in the age of fast networks, and it is now increasingly compelling. However, for remote memory to be successful, it needs to provide a better abstraction. The current abstraction based on RDMA is complex, error-prone, and clunky, limiting its adoption to experts and masochists. In this talk, we describe a number of alternatives that we are exploring, which are simpler, more portable, and conceptually richer. These alternatives have deep implications to the design of new…

General registration is now open for RISE Camp 2018!

Kattt Atchley September 13, 2018

We are offering a very limited number of tickets to general public at $500 price. Past events have sold out quickly, so please act fast! Registration link can be found here. Prerequisites for attending are listed here. The focus of this event will be on the hands-on exercises and tutorials. The ideal attendee will be a practitioner and/or engineer who will be using our software with maybe some project management background as well. Note: this event will also be live streamed and video archived on YouTube for free (we will provide link on the RISE Camp website in the weeks to come) Register now to attend in-person at UC Berkeley! Please note: if you are a UC Berkeley student and…

RISE Seminar 9/7/18: Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes, a talk by Sailesh Krishnamurthy

Kattt Atchley September 6, 2018

Title: Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes Speaker: Sailesh Krishnamurthy Affiliation: Amazon Web Services Date and location: Friday, September 7, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this talk, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service,…

RISE Seminar 9/14/18 : Actor-Oriented Database Systems, a talk by Phil Bernstein

Kattt Atchley September 6, 2018

Title: Actor-Oriented Database Systems Speaker: Philip A. Bernstein Affiliation: Microsoft Research Date and location: Friday, September 14, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Many of today’s interactive, stateful, server applications are processor-intensive and must be scalable and elastic. Hence, they are usually implemented as middle-tier objects backed by a key-value store in cloud storage, rather than as stored procedures in a database system. This enables the system to scale elastically by adding or removing inexpensive middle-tier servers. Example applications include multi-player games, social networking, mobile computing, telemetry, and Internet of Things. When the objects are single-threaded and do not share memory, they are called actors. There are dozens of programming frameworks…

RISE Seminar 8/31/18: Dr. Dahlia Malkhi, VMWare Research: BFT in the lens of Blockchains and Blockchains in the lens of BFT

Kattt Atchley August 28, 2018

Title: BFT in the lens of Blockchains and Blockchains in the lens of BFT Speaker: Dahlia Malkhi Affiliation: VMWare Research Date and location: Friday, August 31, 12:30 – 1:30 pm; Wozniak Lounge (430 Soda Hall) Abstract: Blockchain is a Byzantine Fault Tolerant (BFT) replicated state machine, in which each state-update is by itself a Turing machine with bounded resources. The core algorithm for achieving BFT in a Blockchain appears completely different from classical BFT algorithms: Classical solutions like DLS, PBFT solve BFT among a small-to-medium group of known participants. Such algorithms consist of multiple rounds of message exchanges carrying votes and safety-proofs. They are evidently quite intimidating to the non-expert. In contrast, Bitcoin solves BFT among a…

RISELab Talk 8/29/18: Timothy Roscoe, ETH: Enzian: making systems software research relevant in the face of custom hardware

Kattt Atchley August 28, 2018

Date/time: Wednesday, Aug. 29; 1:30 pm Location: Wozniak Lounge, Soda Hall Title: Enzian: making systems software research relevant in the face of custom hardware. Abstract: Academic research in rack-scale and datacenter computing today is hamstrung by lack of hardware. Cloud providers and hardware vendors build custom accelerators, interconnects, and networks for commercially important workloads, but university researchers are stuck with commodity, off-the-shelf parts. Enzian is a research computer being developed at ETH Zurich (in collaboration with Cavium and Xilinx) which addresses this problem. An Enzian board consists of a server-class ARMv8 SoC tightly coupled and coherent with a large FPGA (eliminating PCIe), with about 0.5 TB DDR4 and nearly 500 Gb/s of network I/O either to the CPU (over…

e-mission platform two-day workshop

K. Shankari August 18, 2018

K. SHANKARI – UC BERKELEY WANT TO UNDERSTAND HOW HUMANS TRAVEL AND WHY? WANT TO SEE IF YOU CAN INFLUENCE TRAVEL BEHAVIOR AND INFLUENCE CLIMATE CHANGE? August 20 | 9:00 am – 12:30 pm in 485 Evans Hall | 12:30 – 5:00 pm in 117 Dwinelle Hall (Level D) August 21 | 9:00 am – 5:00 pm in 117 Dwinelle Hall (Level D) The Global Metropolitan Studies Program and the Division of Data Science are pleased to sponsor a workshop on a platform for collecting and analyzing such data. The workshop will introduce participants to e-mission, an open-source platform for collecting human travel data. The course is intended for students and researchers across a variety of fields. Attend this workshop to…

Dissertation Talk: Machine Learning for Resource Management in the Datacenter and the Cloud

Neeraja Yadwadkar May 5, 2018

Title: Machine Learning for Resource Management in the Datacenter and the Cloud Speaker: Neeraja J. Yadwadkar Advisors: Randy Katz and Joseph Gonzalez Date: Thursday, May 10th, 2018 Time: 1-2pm Location: 465H Soda Hall Abstract: Traditional resource management techniques that rely on simple heuristics often fail to achieve predictable performance in contemporary complex systems that span physical servers, virtual servers, private and/or public clouds. My research aims to bring the benefits of data-driven models to resource management of such complex systems. In my dissertation, I argue that the advancements in machine learning can be leveraged to manage and optimize today’’s systems by deriving actionable insights from the performance and utilization data these systems generate. To realize this vision of model-based resource management, we need to deal with…

RISE Seminar 5/10/18: David Ku (Microsoft): What is Microsoft up to in AI?

Kattt Atchley May 4, 2018

Title: What is Microsoft up to in AI? Date and time: 5/10/18, 12 pm Location: Wozniak Lounge, Soda Hall Abstract: Learn about Microsoft’s vision for transforming how we work and live through the intelligent cloud and intelligent edge. David will share thoughts on the impact of artificial intelligence, technical breakthroughs, and the rising importance of innovating AI against bias. About David: David Ku is corporate vice president for the AI Core group at Microsoft, responsible for horizontal AI capabilities that power products such as Bing, Cortana, Office and Azure. The group includes Bing knowledge graph and Satori platform, Substrate and Office intelligence, Bing Ads marketplace and ad platform, Shared data and analytics platform, Analysis and experimentation, AI Tools and infrastructure,…

RISE Seminar 5/3/18: Mehul Shah (Amazon Web Services): AWS Glue: Serverless Data Integration and Beyond

Kattt Atchley April 30, 2018

Title: AWS Glue: Serverless Data Integration and Beyond Date: Thursday, May 3rd, 12-1pm, Wozniak Lounge (430 Soda Hall) Speaker: Mehul Shah Affiliation: Amazon Web Services Abstract: Organizations want to analyze and gain insight from a growing number of new data sources, such as Internet of Things (IoT) streams, APIs, ad impressions, and log data. However, they are often limited by legacy ETL systems that were designed for transactional data. AWS Glue is a serverless data integration service for these modern data types. In this talk, we present cloud trends that motivate AWS Glue and the popular use-cases that drive its adoption. We show how simple it is to go from raw data to production data cleaning and transformation jobs with AWS Glue. It automatically crawls…

RISE Seminar 4/26/18: Kathy Yelick (UC Berkeley): Machine learning for Science

Kattt Atchley April 24, 2018

Speaker: Professor Kathy Yelick Title: Machine learning for Science Affiliation: UC Berkeley Date: Thursday, April 26, 12pm-1pm, Location: Wozniak Lounge Abstract: TBA Bio: Katherine (Kathy) Yelick is a Professor of Electrical Engineering and Computer Sciences at UC Berkeley and the Associate Laboratory Director (ALD) for Computing Sciences at Lawrence Berkeley National Laboratory. Her research is in high performance computing, programming languages, compilers, parallel algorithms, and automatic performance tuning. She currently leads the Berkeley UPC project and co-lead the Berkeley Benchmarking and Optimization (Bebop) group. As ALD for Computing Sciences at LBNL, she oversees the National Energy Research Scientific Computing Center (NERSC), the Energy Sciences Network (ESnet) and the Computational Research Division (CRD), which covers applied math, computer science, data science and…

RISE Seminar 4/12/18: C. Mohan (IBM Fellow and Distinguished Visiting Prof (Tsinghua Univ)): Landscape of Practical Blockchain Systems and their Applications

Kattt Atchley April 6, 2018

Title: Landscape of Practical Blockchain Systems and their Applications Date: Thursday, April 12th, 12-1pm, Location: Wozniak Lounge (430 Soda Hall) Speaker: C. Mohan Affiliation: IBM Almaden Research Center Abstract: The concept of a distributed ledger was invented as the underlying technology of the public or permissionless Bitcoin cryptocurrency network. But the adoption and further adaptation of it for use in the private or permissioned environments is what I consider to be of practical consequence and hence only such private blockchain systems will be the focus of this talk. Computer companies like IBM, Intel, Oracle, Baidu and Microsoft, and many key players in different vertical industry segments have recognized the applicability of blockchains in environments other than cryptocurrencies. IBM did some…

RISE Seminar 4/5/18: Matt Johnson, Roy Frostig, and Chris Leary (Google Brain): Compiling machine learning programs via high-level tracing

Kattt Atchley April 2, 2018

Title: Compiling machine learning programs via high-level tracing Date: Thursday, April 5th, 12-1pm, Wozniak Lounge (430 Soda Hall) Speaker: Matt Johnson, Roy Frostig, and Chris Leary Affiliation: Google Brain Abstract: We’ll describe JAX, a domain-specific tracing JIT compiler for generating high-performance accelerator code from pure Python and Numpy machine learning programs. JAX uses the XLA compiler infrastructure to generate optimized code for the program subroutines that are most favorable for acceleration, and these optimized subroutines can be called and orchestrated by arbitrary Python. Because the system is fully compatible with Autograd, it allows forward- and reverse-mode automatic differentiation of Python functions to arbitrary order. We show that by combining JAX with Autograd and Numpy we get an easily programmable and…

RISE Seminar 3/22/18: Noah Goodman (Stanford): Probabilistic Programming and models of Cognition

Kattt Atchley March 21, 2018

Title: Probabilistic Programming and models of Cognition Date: Thursday, March 22nd, 12-1pm, Wozniak Lounge (430 Soda Hall) Speaker: Noah Goodman Affiliation: Stanford University Abstract: Noah will be talking about probabilistic programming systems, including Pyro, a recent system he has been building at Uber. Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling. Bio: Noah is an Associate Professor of Psychology and Computer Science, and Linguistics (by courtesy), at Stanford University. His research interests lie in computational models of cognition, probabilistic programming languages, natural language semantics and pragmatics, and social cognition.

RISE Seminar 3/15/18: Alexandra Meliou (UMass Amherst): Diagnoses and Explanations: Creating a Higher-Quality Data World

Kattt Atchley March 9, 2018

Title: Diagnoses and Explanations: Creating a Higher-Quality Data World Abstract: The correctness and proper function of data-driven systems and applications relies heavily on the correctness of their data. Low quality data can be costly and disruptive, leading to revenue loss, incorrect conclusions, and misguided policy decisions. Improving data quality is far more than purging datasets of errors; it is critical to improve the processes that produce the data, to collect good data sources for generating the data, and to address the root causes of problems. Our work is grounded on an important insight: While existing data cleaning techniques can be effective at purging datasets of errors, they disregard the fact that a lot of errors are systemic, inherent to the…

RISE Seminar 3/8/18: Peter Gao: Practical Computer Vision For Self Driving Cars

Kattt Atchley March 6, 2018

Title: Practical Computer Vision For Self Driving Cars Date: Thursday, March 8th, 12-1pm Wozniak Lounge (430 Soda Hall) Speaker: Peter Gao Affiliation: Cruise Automotive Abstract: Cruise has built a fleet of self driving cars around San Francisco. Getting these cars to drive is a hard engineering and science problem – this talk explains roughly how self driving cars work and how computer vision, from camera hardware to deep learning, help make a self-driving car go. Bio: Peter works on computer vision at Cruise. Before, he worked on Caffe at Berkeley and did research on using deep learning for object detection. Before working on self driving cars, Peter worked on machine learning for spam detection at Pinterest and student learning models…

RISE Seminar 2/15/18 : Mihai Budiu (VMware) – Hillview: a spreadsheet for big data

Kattt Atchley February 13, 2018

Title: Hillview: a spreadsheet for big data By: Mihai Budiu Affiliation: VMware Where/When: Thursday Feb 15 noon-1pm Wozniak Lounge (430 Soda Hall) Abstract: Hillview is a cloud service for providing interactive browsing of large data sets. The code is available using an Apache 2 license at https://github.com/vmware/hillview. In this presentation we describe and demonstrate the design and implementation of the system. Hillview offers sub-second renderings of billion-row datasets. Hillview is built on top of a platform for executing distributed sketching algorithms. A core design principle is to compute only what can be displayed and nothing more. This principle sometimes enables the system to use constant-work algorithms, independent on the actual data size. Bio: Mihai Budiu is a senior researcher at…

RISE Seminar 2/1/18 : Yangqing Jia – Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

Kattt Atchley January 31, 2018

Title: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective By: Yangqing Jia Affiliation: Facebook AI Where/When: Thursday Feb 1 noon-1pm Wozniak Lounge (430 Soda Hall) Abstract: Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU…

BEARS Symposium followed by RISELab Open House and poster session on Thursday, February 8, 2018

Kattt Atchley January 25, 2018

Registration is now open for the Berkeley EECS Annual Research Symposium (BEARS). The symposium will be held at Sibley Auditorium on the UC Berkeley campus on Thursday, February 8, 2018 from 9:30AM to 12:00PM. Following the BEARS Symposium, RISELab will hold an open house and poster session inside the lab from 1:00PM to 3:00PM, which is located at 465 Soda Hall. The poster session format will be conducted as an open house in which researchers will display their posters throughout the lab and be on hand to discuss their research. Come hang out with us!

RISE Seminar 1/25/18: Tiny functions for codecs, compilation, and (maybe) soon everything, talk by Keith Winstein

Kattt Atchley January 23, 2018

Title: Tiny functions for codecs, compilation, and (maybe) soon everything By: Keith Winstein Affiliation: Professor, Stanford Computer Science Where/When: Thursday Jan 25 noon-1pm Wozniak Lounge (430 Soda Hall) Abstract: Networks, applications, and media codecs frequently treat one another as strangers. By expressing large systems as compositions of small, pure functions, we’ve found it’s possible to achieve tighter couplings between these components, improving performance without giving up modularity or the ability to debug. I’ll discuss our experience with systems that demonstrate this basic idea: ExCamera (NSDI 2017) parallelizes video encoding into thousands of tiny tasks, each handling a fraction of a second of video, much shorter than the interval between key frames, and executing in parallel on AWS Lambda. This was…

RISE Seminar 1/18/18: Nearest Neighbor Search: Theory and Practice, by Ludwig Schmidt

Kattt Atchley January 18, 2018

Title: Nearest Neighbor Search: Theory and Practice By: Ludwig Schmidt Affiliation: Postdoc, UC Berkeley EECS Where/When: Thursday Jan 18 noon-1pm Wozniak Lounge (430 Soda Hall) Abstract: Nearest neighbor search is a fundamental algorithmic primitive when dealing with large datasets: given a new query, the goal is to find the closest point in the dataset. For real-time applications, it is especially important to process nearest neighbor queries with low latency, even for high-dimensional data such as images and videos. In this talk, I will give an overview of the theory and practice of nearest neighbor search, both of which have seen significant progress over the past few years. The focus will be on the Locality-Sensitive Hashing (LSH) framework and how it…

RISE Seminar: Jon Tamir (UC Berkeley): Computational Magnetic Resonance Imaging and the Berkeley Advanced Reconstruction Toolbox (BART)

Kattt Atchley November 2, 2017

Title: Computational Magnetic Resonance Imaging and the Berkeley Advanced Reconstruction Toolbox (BART) Where: 12:30-1:30 Wozniak Lounge Thursday Speaker: Jon Tamir (UC Berkeley) Description: Magnetic resonance imaging (MRI) is a powerful medical imaging modality that is non-invasive and has no ionizing radiation. In computational MRI, we use prior knowledge to inform the design of new acquisition schemes, as well as solve large-scale image reconstruction problems that may consist of millions of unknowns. To efficiently solve these problems, we have developed the Berkeley Advanced Reconstruction Toolbox (BART). BART is a free and open source image reconstruction framework, available at http://mrirecon.github.io/bart/. The toolbox contains a multi-dimensional array processing library and implements several reconstruction algorithms for parallel imaging and compressed sensing. In this talk I will give an overview of MRI, discuss some high-dimensional…

RISE Seminar: Jason Gauci: Reinforcement Learning in Production

Kattt Atchley October 26, 2017

This week’s RISE Seminar will be Jason Gauci from Facebook. Lunch will be served. Where: Wozniak Lounge, Soda Hall Thursday 26th 12:30-1:30 Title: Reinforcement Learning in Production Abstract: All recommender systems follow some policy. Historically, the policy has been engineered by hand: most of the machine learning focused on supervised learning techniques to provide as much signal to the policy engineers as possible. However, the goals of a recommender system (conversions, customer satisfaction, engagement) are integrated across some time horizon. At Facebook we have found that reinforcement learning (RL) gives us the ability to optimize directly for our key goals and this talk will focus on our on-going effort to scale RL and use RL models to make recommendations for billions…

RISE Seminar: Hammad Mazhar: Training an army of golfing robots through simulation

Kattt Atchley October 17, 2017

For this week’s RISE Seminar, we will have Hammad Mazhar from NVIDIA speaking. Lunch is served. Abstract: There are many ways to leverage simulation for machine learning problems. From articulated mechanisms, to cloth, fluids and soft bodies; through simulation we can experiment with problems otherwise intractable. Additionally, techniques like synthetic data generation and virtual sensors allow us to use domain randomization for improved transfer of learning. This talk will cover some of the recent work at NVIDIA that combines simulation, reinforcement learning and headless rendering through containers to scale up the number of agents we can throw at a task. For 1-on-1 meetings sign up here: https://doodle.com/poll/twx65ke9yzrmxsw8

RISE Seminar: Matthias Boehm: Declarative Machine Learning for Low-Latency to Large-Scale Deployments

Melissa Mecca October 15, 2017

Matthias Boehm from IBM Research – Almaden will be giving a talk on SystemML at Thursday October 5, 12:30-1:30 in the Wozniak Lounge. Lunch is served. Title: Apache SystemML: Declarative Machine Learning for Low-Latency to Large-Scale Deployments Abstract: Declarative machine learning (ML) aims to simplify the development and usage of large-scale ML algorithms. In SystemML, data scientists specify ML algorithms in a high-level language with R-like syntax and the system automatically generates hybrid execution plans that combine single-node, in-memory operations and distributed operations on Spark. In a first part, we motivate declarative ML and provide an up-to-date overview of SystemML including its APIs for different deployments. Since it was rarely mentioned before, we specifically discuss a programmatic API for low-latency scoring…

RISE Seminar: Strategies for integrating people and machine learning in online systems

Melissa Mecca October 15, 2017

When/Where: Thursday 9/21 12:30-1:30 Wozniak Lounge, Soda Hall, UC Berkeley Abstract: Clara Labs is an email-based scheduling service for busy people. Simply CC Clara on an email to a person you want to meet with, and we’ll handle the back and forth game of email-tag for you in accordance with your preferences. To build a robust and accurate system that gracefully handles nuanced requests, we’ve combined machine learning (ML) with a distributed human labor force. This system enables a single person to schedule consistently for an unbounded number of customers, regardless of worker location or lack of a priori customer context. A partially-automated system has clear benefits, such as increased accuracy and decreased cost (i.e., increased scalability). Further, human input to the…

RISE Seminar: Anant Bhardwaj: Instabase: Design Challenges in Building End User Data Systems

Melissa Mecca October 15, 2017

For this week’s RISE Seminar, we will have Anant Bhardwaj who is the founder & CEO of Instabase. Abstract In this talk, I’ll highlight some of the design decisions we made in building Instabase — a software platform with a suite of applications to automate complex data operations. The software platform includes [1] a flexible data store (for managing data), and [2] an app store (end user tools/applications for various use-cases). The talk will touch upon some lessons learned in building the system, and highlight some interesting areas for future work. Instabase is a software platform that enables organizations to automate complex business operations by combining various Instabase tools and applications together. The suite of services and applications include [1] tools…