Dissertation Talk with Melih Elibol: NumS: Scalable Array Programming for the Cloud; 10 AM, Tuesday, May 10

May 10, 2022

Title: NumS: Scalable Array Programming for the Cloud
Speaker: Melih Elibol
Advisors: Michael I. Jordan and Ion Stoica

Date: Tuesday, May 10th, 2022
Time: 10:00am – 11:00am PT

Location: 380 Soda Hall
Zoom Link: https://berkeley.zoom.us/j/7173513117?pwd=aXVjZmVwM2NTbmx4RFVxMUszUHJIZz09
Meeting ID: 717 351 3117
Passcode: 1337

Abstract:
Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. Existing solutions that scale arrays and machine learning models provide limited functionality of the scikit-learn library, and achieve sub-optimal performance on numerical operations by relying on dynamic scheduling provided by task-based distributed systems. This can lead to performance problems which are difficult to address without in-depth knowledge of the underlying distributed system.

In this thesis, I break down what is required to seamlessly scale the NumPy and scikit-learn APIs on task-based distributed systems into three primary components: (1) An API providing automatic loop parallelism for Python programs written using numpy operations and scikit-learn models. (2) A distributed array data structure, optimized for task-based distributed systems, which enables scalable execution of element-wise, linear and tensor algebra operations. (3) A scheduler optimized for our representation of distributed array operations. I summarize each component, along with the functionality they enable, within the scope of the open source project NumS. In particular, our solution to scheduling, called Load Simulated Hierarchical Scheduling (LSHS), significantly enhances NumS performance on the distributed system Ray, and is robust to a variety of block partitioning sizes on a variety of microbenchmarks. I explain the performance of our scheduler by presenting communication lower bounds, and showing that LSHS attains some of these lower bounds. As a real world application, NumS is being used by the National Observatory of Athens research group to scale their deep learning methods globally for daily wildfire danger forecasting.