Shivaram Venkataraman


Blink: Fast and Generic Collectives for Distributed ML

Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Towards Fast and Scalable Graph Pattern Mining

Bridging the GAP: Towards Approximate Graph Analytics

Blink: A fast NVLink-based collective communication library

Drizzle: Fast and Adaptable Stream Processing at Scale

Occupy the Cloud: Distributed Computing for the 99%

Breaking Locality Accelerates Block Gauss-Seidel

SparkR: Scaling R Programs with Spark

Blog Posts

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark

Shivaram Venkataraman Uncategorized

This work was done in collaboration with Ding Ding and Sergey Ermolin from Intel. In recent years, the scale of datasets and models used in deep learning has increased dramatically. Although larger datasets and models can improve the accuracy in many AI applications, they often take much longer to train on a single machine. However, it is not very common to distribute the training to large clusters using current popular deep learning frameworks, compared to what’s been long around in the Big Data area, as it’s often harder to gain access to a large GPU cluster and lack of convenient facilities in popular DL frameworks for distributed training. By leveraging the cluster distribution capabilities in Apache Spark, BigDL successfully performs very large-scale distributed…