Dissertation Talk: Usable and Efficient Systems for Machine Learning by Doris Xin; Thursday, April 22nd, 10 AM

April 22, 2021

TitleUsable and Efficient Systems for Machine Learning
Speaker: Doris Xin
Advisor: Aditya Parameswaran
 
Date: Thursday, April 22nd, 2021
Time: 10:00am – 11:00am PST
Meeting ID: 971 0623 7545
Passcode: 405841
 
Abstract: 
Machine learning became a key driver for technological advancement in the last decade thanks to major progress in programming interfaces and scalable systems.  Libraries such as  Scikit-learn and  Keras have made it easier to implement machine learning algorithms and applications, while innovations in distributed systems have enabled model training at an unprecedented scale.  However, machine learning tooling is far from perfect today; practitioners still face many challenges developing applications powered by machine learning.

This dissertation aims to improve the usability and resource efficiency of systems for developing and productionizing machine learning applications by investigating several directions identified through extensive empirical evidence gathering and analysis.  First, we study the applied machine learning literature and execution traces of workflows to understand common practices adopted by practitioners and shed light on the highly iterative process of model development;  we present two solutions to accelerate the iterative model development process.  Next, we analyze the provenance graph of thousands of production pipelines to uncover latent inefficiencies in the system serving these pipelines;  we propose a solution to reduce wasted computation in this system significantly. Finally, we synthesize findings from interviews with current users of automated machine learning tools to examine the role of automation in model development as we look ahead to the future of machine learning developer tools.