RISE Seminar: Matthias Boehm: Declarative Machine Learning for Low-Latency to Large-Scale Deployments

October 5, 2017

Matthias Boehm from IBM Research – Almaden will be giving a talk on SystemML at Thursday October 5, 12:30-1:30 in the Wozniak Lounge. Lunch is served.

Title: Apache SystemML: Declarative Machine Learning for Low-Latency to Large-Scale Deployments

Abstract:

Declarative machine learning (ML) aims to simplify the development and usage of large-scale ML algorithms. In SystemML, data scientists specify ML algorithms in a high-level language with R-like syntax and the system automatically generates hybrid execution plans that combine single-node, in-memory operations and distributed operations on Spark. In a first part, we motivate declarative ML and provide an up-to-date overview of SystemML including its APIs for different deployments. Since it was rarely mentioned before, we specifically discuss a programmatic API for low-latency scoring and its usage in containerized and data-parallel environments. In a second part, we then discuss selected research results for large-scale ML, specifically, compressed linear algebra (CLA) and automatic operator fusion. CLA aims to fit larger datasets into available memory by applying lightweight database compression schemes to matrices and executing linear algebra operations directly on the compressed representations. In contrast, automatic operator fusion aims at avoiding materialized intermediates and unnecessary scans, as well as sparsity exploitation by optimizing fusion plans and generating code for these custom fused operators. Together, CLA and automatic operator fusion achieve significant end-to-end improvements as they address orthogonal bottlenecks of large-scale ML algorithms.

Bio:

Matthias Boehm is a Research Staff Member at IBM Research – Almaden, where he is working since 2012 on optimization and runtime techniques for declarative, large-scale machine learning in SystemML. Since Apache SystemML’s open source release in 2015, he also serves as a PMC member. He received his Ph.D. from Technische Universitaet Dresden in 2011 with a dissertation on cost-based optimization of integration flows under the supervision of Prof. Wolfgang Lehner. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award and a 2016 SIGMOD Research Highlight Award.