We’re excited to be releasing v0.1 of the Ground project!
Ground is a data context service. It is a central repository for all the information surrounding the use of data in an organization. Ground concerns itself with what data an organization has, where that data is, who (both human beings and software systems) is touching that data, and how that data is being modified and described.
Above all, Ground aims to be an open-source, vendor neutral system that provides users an unopinionated metamodel and set of APIs that allow them to think about and interact with data context generated in their organization.
Ground has many use cases, but we’re focused on two specific ones at present:
- Data Inventory: large organizations typically have a huge diversity of datasets, stored across many repositories, and used in numerous ways. We’re working with Capital One, one of the RISElab sponsors, to use Ground to track their global data inventory.
- AI Lifecycle Management: AI-centric applications depend on rich pipelines of ever-changing code, models, training sets, test sets, and user feedback across development and deployment servers. We’re working with the Clipper team on helping to track and manage this lifecycle.
In future blog posts we plan to have more detail about both of these efforts.
Current State
The Ground v0.1 release is a Play application built on top of PostgreSQL. The current system aims to be very simple and narrowly focused–that is to say “we haven’t build everything we want to yet!”
In previous (unreleased) incarnations, Ground has supported multiple data stores in addition to Postgres, namely Apache Cassandra and Neo4j. We plan to reinstate support for these databases in future releases. We also plan to add support for ElasticSearch as a text index for efficient tag retrieval, and perhaps most importantly, we believe that there’s a large open space for query API design. If you’re interested in Ground and have thoughts about what kind of queries Ground might want to support, we’d love to hear from you.
Research Avenues
There are also a number of exciting research avenues that we’re interested in pursuing. Chief among these is the management of AI application lifecycles discussed above. We are also interested in versioned storage. Versioned storage has had a resurgence of late with industrial products like Pachyderm and Datomic as well as research projects like OrpheusDB. Our graph-based metamodel doesn’t fit neatly into any of these boxes, and we imagine that a system like Ground will have different requirements. We’re excited to further investigate the challenges of versioned graph systems.
We’ll continue posting both here and at the Ground website as things progress.