Dissertation Talk: Usable and Efficient Systems for Machine Learning by Doris Xin; Thursday, April 22nd, 10 AM

April 22, 2021

Title: Usable and Efficient Systems for Machine Learning

Speaker: Doris Xin

Advisor: Aditya Parameswaran

Date: Thursday, April 22nd, 2021

Time: 10:00am – 11:00am PST

Location: Zoom https://zoom.us/j/97106237545?pwd=NGdtY1d6WGVJYU1PM2R5T3hDZkt6Zz09

Meeting ID: 971 0623 7545

Passcode: 405841

Abstract:

Machine learning became a key driver for technological advancement in the last decade thanks to major progress in programming interfaces and scalable systems. Libraries such as Scikit-learn and Keras have made it easier to implement machine learning algorithms and applications, while innovations in distributed systems have enabled model training at an unprecedented scale. However, machine learning tooling is far from perfect today; practitioners still face many challenges developing applications powered by machine learning.

This dissertation aims to improve the usability and resource efficiency of systems for developing and productionizing machine learning applications by investigating several directions identified through extensive empirical evidence gathering and analysis. First, we study the applied machine learning literature and execution traces of workflows to understand common practices adopted by practitioners and shed light on the highly iterative process of model development; we present two solutions to accelerate the iterative model development process. Next, we analyze the provenance graph of thousands of production pipelines to uncover latent inefficiencies in the system serving these pipelines; we propose a solution to reduce wasted computation in this system significantly. Finally, we synthesize findings from interviews with current users of automated machine learning tools to examine the role of automation in model development as we look ahead to the future of machine learning developer tools.