Inductive Matrix Completion with Embedding Memory and Bandit Exploration for Safe Hint Recommendation
Abstract
Learned query optimizers struggle to generalize, causing performance regressions for a subset of queries. To address this, DataSwift is introduced, a hint-recommendation framework that integrates LLM-derived SQL embeddings, GNN-encoded plan representations, a similarity-threshold memory cache, and Thompson-sampling bandit exploration. Incoming queries are embedded to recall proven hints; a low-rank inductive matrix completion model predicts expected latency. Validated hints are cached and the bandit down-weights any hints inducing slowdowns. On the combined JOB benchmarks, DataSwift incurs only a 0.7% regression rate with zero catastrophic regressions and delivers a 1.4x improvement on the 5% slowest queries. Thus, DataSwift provides performance gains without sacrificing safety.