(This article was authored by Sanjay Krishnan, Zongheng Yang, Joe Hellerstein, and Ion Stoica.)
What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable recent introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query.
The au courant research direction, inspired by trends in Computer Vision, Natural Language Processing, and Robotics, is to apply deep learning; let the database learn the value of each execution strategy by executing different query plans repeatedly (an homage to Google’s robot “arm farm”) rather through a pre-programmed analytical cost model. This proposal is not as radical as it seems: relational database management systems have always used statistical estimation machinery in query optimization such as using histograms, sampling methods for cardinality estimation, and randomized query planning algorithms. Similarly, learning from prior planning instances is not new either.
These techniques may not “feel” like modern AI, but are, in fact, statistical inference mechanisms that carefully balance generality, ease of update, and separation of modeling concerns. In recognition of this, we argue that a first step towards a learned optimizer is to understand the classical components, such as plan space parametrization, search heuristics, and cost modeling, as statistical learning processes. Query optimization is a problem with a 40-year research history, and to give the problem its well-deserved respect, we attempt to contextualize the techniques that worked in the past in a modern AI light.
In our newly updated paper “Learning to Optimize Join Queries With Deep Reinforcement Learning”, we show that the classical Selinger-style join enumeration has profound connections with Markovian sequential decision processes. Join optimization is the problem of optimally selecting a nesting of 2-way join operations to answer a k-way join in a SQL query. Traditionally, the Selinger optimizer constructs a table memoizing the optimal subplans (best 2-way, best 3-way, …, and so on) and their associated costs. This table grows combinatorially with the number of relations (namely, k) and the costs in the table are sensitive to the particular SQL query (e.g., if there are any filters on individual attributes). Therefore, it is infeasible to persist all of that information indefinitely for re-use in future plans.
Reinforcement learning (RL) gives us new insight into this conundrum. RL reduces sequential planning to statistical estimation. Rather an exact memoization table, we can treat the subplans enumerated by past planning instances as training data to build a model. The estimates from this model can focus the enumeration in future planning instances (in fact reducing the complexity of enumeration to cubic time–at parity with a greedy scheme). This approach is a form of Deep Q-Learning inspired by algorithms used to play Atari games and train robots.
Our updated paper shows that we can integrate this approach into full-featured query optimizers, PostgreSQL, Apache Calcite, and Apache Spark, with minimal modification. Using only a moderate amount of training data (less than 100 training queries), our deep RL-based optimizer can achieve plan costs within 2x of the optimal solution on all cost models that we considered, and it improves on the next best heuristic by up to 3x — all at a planning latency that is up to 10x faster than dynamic programs and 10,000x faster than exhaustive enumeration. Compared to similar learning proposals on the same benchmarks DQ requires at least 3 orders of magnitude less training data; primarily because it exploits the inherent structure of the planning problem.
DQ addresses the problem of learning a search heuristic from data in a way that is independent of the cost modeling or plan space. DQ is very extensible. We are currently extending the DQ optimizer to produce plans that persist intermediate results for use in future queries. These materialization operations are simply additional join types that can be selected by DQ. The cost model is now augmented to estimate the incremental marginal benefit of storing, using, and maintaining the materialized view created. This estimate is itself another online learning process since the benefit of materializing a view may only be observed well into the future. The general idea draws from prior work in “opportunistic materialization”, but is tightly coupled with the query optimizer; a plan may be instantaneously suboptimal but creates valuable intermediate artifacts for future use. The magic of this abstraction is that DQ itself does not need to know what the cost model represents or that it has a component that is accounting for effects that may happen after query execution.