RISE Seminar: Get Your Data Together! Algorithms for Managing Data Lakes by Erkang Zhu

March 1, 2019

Title: Get Your Data Together! Algorithms for Managing Data Lakes
Speaker: Erkang (Eric) Zhu
Affiliation: University of Toronto
Date and location: Friday, March 1, 12:30 – 1:30 pm, Wozniak Lounge (430 Soda Hall)
Abstract: Data lakes (e.g., enterprise data catalogs and Open Data portals) are data dumps if users cannot find and utilize the data in them. In this talk, I present two problems in massive, dynamic data lakes: 1) searching for joinable tables to discover potential linkages, and 2) joining tables from different sources through auto-generated syntactic transformation on join values. I will also present two algorithmic solutions that can be used for data lakes that are large both in the number of tables (millions) and table sizes. The presented work has been published in SIGMOD and VLDB.
Bio: Erkang (Eric) Zhu is a 5th year computer science PhD candidate at University of Toronto. His supervisor is Prof. Renée J. Miller. His research focuses on data discovery, large-scale similarity search, and randomized algorithms (data sketches).