Login    Contact Us    Search

RISE Lab

REAL-TIME INTELLIGENT SECURE EXECUTION Navigation
  • Home
  • People
  • Projects
  • Publications
  • Sponsors
  • DARE
  • Academics
  • News
  • Events
  • RISE Camp
  • Blogs
  • Jenkins
  • Search
  • Home
  • People
  • Projects
  • Publications
  • Sponsors
  • DARE
  • Academics
  • News
  • Events
  • RISE Camp
  • Blogs
  • Jenkins
  • Search

IndexedRDD

Immutability dramatically simplifies fault tolerance, straggler mitigation, and data consistency and is an essential part of widely-used distributed batch analytics systems including MapReduce, Dryad, and Spark. However, these systems are increasingly being used for new applications like stream processing and incremental analytics, which often demand fine-grained updates and are seemingly at odds with the essential assumption of immutability. We introduce the persistent adaptive radix tree (PART), a map data structure that supports efficient fine-grained updates without compromising immutability. In addition, PART (1) allows applications to trade off latency for throughput using batching, (2) supports efficient scans using an optimized memory layout and periodic compaction, and (3) achieves efficient fault recovery using incremental checkpoints. PART achieves update performance comparable to a mutable hash table, scan performance within 2x of a B-tree due to compaction, and memory usage within 2x of optimal. Across a variety of distributed applications, PART improves upon the performance of state-of-the-art immutable and even mutable systems by anywhere from 50x to 4 orders of magnitude.

Ankur Dave

ankurd@eecs.berkeley.edu

Ion Stoica

istoica@cs.berkeley.edu

Joey Gonzalez

jegonzal@cs.berkeley.edu

Share
Tweet
Share


Accessibility · Nondiscrimination · Privacy

 
  • Home
  • People
  • Projects
  • Publications
  • Sponsors
  • DARE
  • Academics
  • News
  • Events
  • RISE Camp
  • Blogs
  • Jenkins


The UCBerkeley RISELab is an NSF Expedition Project.