Ray’s actor model, and its simplicity of implementation in Python-based frameworks is a primary motivation for using it in a higher-level research framework (Realm, to be published) for real-time analytic monitoring of critical events (e.g. suicide risk, infectious diseases, etc.) that my colleagues and I are working on. The purpose of Realm is to provide: real time alerts in response to, for example, continuously occurring, high-frequency clinical events, support for online learning, distribution of model updates through the hierarchy of models (local/embedded, regional, and central), and targeted, selective, and context-specific updates to these models. While other frameworks provide most of these functions (Akka, Spark, Kafka, etc.), we are exploring and using Ray because of its simplicity of implementation (using actors…
Confluo: Millisecond-level Queries on Large-scale Live Data
Confluo is a system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad-hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. We are excited to release Confluo as an open-source C++ project, comprising: Confluo’s data structure library, that supports high throughput ingestion of logs, along with a wide range of online (live aggregates, conditional trigger executions, etc.) and offline (ad-hoc filters, aggregates, etc.) queries, and, A Confluo server implementation, that encapsulates the data structures and exposes its operations via an RPC interface, along with client libraries in C++, Java and Python.…
Anna: A Crazy Fast, Super-Scalable, Flexibly Consistent KVS 🗺
This article cross-posted from the DataBeta blog. There’s fast and there’s fast. This post is about Anna, a key/value database design from our team at Berkeley that’s got phenomenal speed and buttery smooth scaling, with an unprecedented range of consistency guarantees. Details are in our upcoming ICDE18 paper on Anna. Conventional wisdom (or at least Jeff Dean wisdom) says that you have to redesign your system every time you scale by 10x. As researchers, we asked the counter-cultural question: what would it take to build a key-value store that would excel across many orders of magnitude of scale, from a single multicore box to the global cloud? Turns out this kind of curiosity can lead to a system with pretty interesting practical…