RISE Seminar 3/15/18: Alexandra Meliou (UMass Amherst): Diagnoses and Explanations: Creating a Higher-Quality Data World

March 15, 2018

Title: Diagnoses and Explanations: Creating a Higher-Quality Data World

The correctness and proper function of data-driven systems and applications relies heavily on the correctness of their data.  Low quality data can be costly and disruptive, leading to revenue loss, incorrect conclusions, and misguided policy decisions.  Improving data quality is far more than purging datasets of errors; it is critical to improve the processes that produce the data, to collect good data sources for generating the data, and to address the root causes of problems.

Our work is grounded on an important insight:  While existing data cleaning techniques can be effective at purging datasets of errors, they disregard the fact that a lot of errors are systemic, inherent to the process that produces the data, and thus will keep occurring unless the problem is corrected at its source.  In contrast to traditional data cleaning, we focus on data diagnosis: explaining where and how the errors happen in a data generative process.  I will describe our work on Data X-Ray and QFix, two diagnostic frameworks for large-scale extraction systems and relational data systems.  I will also discuss our work on MIDAS, a recommendations system that improves the quality of datasets by identifying and filling information gaps.  Finally, I will discuss a vision for explanation frameworks to assist the exploration of information in a varied, diverse, highly non-integrated data world.

Alexandra Meliou is an Assistant Professor in the College of Information and Computer Science, at the University of Massachusetts, Amherst.  Prior to that, she was a Post-Doctoral Research Associate at the University of Washington, working with Dan Suciu.  Alexandra received her PhD degree from the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley.  She has received recognitions for research and teaching, including a CACM Research Highlight, an ACM SIGMOD Research Highlight Award, an ACM SIGSOFT Distinguished Paper Award, an NSF CAREER Award, a Google Faculty Research Award, and a Lilly Fellowship for Teaching Excellence.  Her research focuses on data provenance, causality, explanations, data quality, and algorithmic fairness.