Data scientists try many ideas quickly using development environments and toolchains with which they are familiar. The open and free creative process can be at odds with the formal and systematic practice of tracking experiments. As a result, it is easy for data scientists to neglect the necessary steps for ensuring experiment reproducibility, and compromise a sound scientific method. At the other extreme, an overly bureaucratic and mentally strenuous process can hamper innovation and be deleterious to the creation of the ML applications of tomorrow.
Flor is a system with a declarative DSL embedded in python for managing the workflow development phase of the machine learning lifecycle. Jarvis enables data scientists to describe ML workflows as directed acyclic graphs (DAGs) of Actions, Artifacts, or Literals and to experiment with different configurations quickly by running multi-trial experiments. To date, flor serves as a build system for producing some desired artifact, and serves as a versioning system that enables tracking the evolution of artifacts across multiple runs in support of reproducibility. Flor stores and manages all the data context generated through various activities that are part of the pipeline development or training process in Ground. Flor will include tools for analyzing past experiments and answering questions about experiment results or behavior.