RISE lab postdoc Hao Zhang–working with Prof. Ion Stoica–has won the Jay Lepreau Best Paper Award at OSDI ’21 for the paper: “Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning”.
This paper develops a new measure of goodness for designing and implementing resource scheduling systems for deep learning, namely Goodput, which combines statistical efficiency and system throughput for deep learning.
It demonstrates a new system for optimizing Elastic Deep Learning in GPU Clusters. The system will be integrated with Ray ecosystem soon.
More details about the paper and code can be found here:
Integration with Ray: https://github.com/ray-