Joe Hellerstein, Author at RISE Lab

Joe Hellerstein

Co-Director of the RISELab. Jim Gray Professor of Computer Science at Berkeley Founder and CSO, Trifacta.

Publications

Blog Posts

A History of Postgres

Joe Hellerstein January 9, 2019 blog, Database Systems, Open Source, Projects, Systems, Uncategorized 0 Comments

(crossposted from databeta.wordpress.com) The ACM began commissioning a series of reminiscence books on Turing Award winners. Thanks to hard work by editor Michael Brodie, the first one is Mike Stonebraker’s book, which just came out. I was asked to write the chapter on Postgres. I was one of the large and distinguished crew of grad students on the Postgres project, so this was fun. ACM in its wisdom decided that these books would be published in a relatively traditional fashion—i.e. you have to pay for them. The publisher, Morgan-Claypool, has this tip for students and ACM members: Please note that the Bitly link goes to a landing page where Students, ACM Members, and Institutions who have access to the ACM…

An Overview of the CALM Theorem

Joe Hellerstein January 8, 2019 blog, Database Systems, Distributed Systems, Theoretical Computer Science 0 Comments

For folks who care about what’s possible in distributed computing: Peter Alvaro and I wrote an introduction to the CALM Theorem and subsequent work that is now up on arXiv. The CALM Theorem formally characterizes the class of programs that can achieve distributed consistency without the use of coordination. — Joe Hellerstein (Cross-posted from databeta.wordpress.com.) I spent a good fraction of my academic life in the last decade working on a deeper understanding of how to program the cloud and other large-scale distributed systems. I was enormously lucky to collaborate with and learn from amazing friends over this period in the BOOM project, and see our work picked up and extended by new friends and colleagues. Our research was motivated by…

Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS 🗺

Joe Hellerstein March 12, 2018 blog, Database Systems, Distributed Systems, Real-Time, Systems, Uncategorized 0 Comments

This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural question: what would it take to build a key-value store that would excel across many orders of magnitude of scale, from a single multicore box to the global cloud? Turns out this kind of curiosity can lead to a system with pretty interesting practical…

RISELab Announces 3 Open Source Releases

Joe Hellerstein May 30, 2017 blog, Clipper, Ground, Open Source, Projects, Ray, Systems

Part of the Berkeley tradition—and the RISELab mission—is to release open source software as part of our research agenda. Six months after launching the lab, we’re excited to announce initial v0.1 releases of three RISElab open-source systems: Clipper, Ground and Ray. Clipper is an open-source prediction-serving system. Clipper simplifies deploying models from a wide range of machine learning frameworks by exposing a common REST interface and automatically ensuring low-latency and high-throughput predictions. In the 0.1 release, we focused on reliable support for serving models trained in Spark and Scikit-Learn. In the next release we will be introducing support for TensorFlow and Caffe2 as well as online-personalization and multi-armed bandits. We are providing active support for early users and will be following Github issues…

Metadata Megafail: Messing up Your Data Strategy in 3 Easy Steps

Joe Hellerstein February 27, 2017 blog, Ground 0 Comments

A key aspect of the RISELab agenda is to aggressively harness data—lots of it, both historical and live. Of course bits in computers don’t provide value on their own. We need a broader context for data: where it came from, what it represents, and how it gets used. Traditionally, people called this metadata: the data about our data. Requirements for metadata have changed drastically in recent years in response to technology trends. There’s an emerging groundswell to address these new requirements and explore new opportunities. This includes our work on the broader notion of data context in the Ground system. How should data-driven organizations respond to these changing requirements? In the tradition of Berkeley advice like how to build a bad research center and…