Context: The Missing Piece in the Machine Learning Lifecycle

Rolando Garcia

Machine learning models have become ubiquitous in modern applications. The ML Lifecycle describes a three-phase process used by data scientists and data engineers to develop, train, and serve models. Unfortunately, context around the data, code, people, and systems involved in these pipelines is not captured today. In this paper, we first discuss common pitfalls that missing context creates. Some examples where context is missing include tracking the relationships between code and data and capturing experimental processes over time. We then discuss techniques to address these challenges and briefly mention future work around designing and implementing systems in this space.

Published On: August 19, 2018

Presented At/In: Workshop on Common Model Infrastructure, KDD 2018

Download Paper: https://rise.cs.berkeley.edu/wp-content/uploads/2019/02/Flor_CMI_18_CameraReady.pdf

Authors: Rolando Garcia, Vikram Sreekanti, Neeraja Yadwadkar, Dan Crankshaw, Joseph Gonzalez, Joe Hellerstein