The increasing interest in the Internet-of-Things (IoT) suggest that a new source of big data is imminent—they would likely be produced by machines and sensors in the IoT ecosystem. The fundamental characteristic of the data produced by these sources is that they are inherently geospatial in nature. In addition, they exhibit unprecedented and unpredictable skew. Thus, big data systems designed for IoT applications must be able to efficiently ingest, index and query spatial data having heavy and unpredictable skew. Spatial indexing is well explored area of research in literature, but little or no attention has been given to the topic of efficient distributed spatial indexing.
In this paper, we propose SIFT, a distributed spatial index and its implementation. Unlike systems that depend on load balancing mechanisms that kick-in post ingestion, SIFT tries to distribute the incoming data along the distributed structure at indexing time and thus incurs minimal rebalancing overhead. SIFT depends only on an underlying key-value store, hence is implementable in many existing big data stores. We have implemented SIFT in a popular open source data store. Our evaluations show promising results—SIFT achieves up to 8× reduction in in- dexing overhead while simultaneously reducing the query latency and index size by over 2× and 3× respectively, in a distributed environment compared to state-of-the-art.
Published On: September 25, 2017
Presented At/In: ACM Symposium on Cloud Computing 2017 (SoCC '17)