Timezone: »

High-Dimensional Sparse Linear Bandits
Botao Hao · Tor Lattimore · Mengdi Wang

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1452

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, such as personalized medicine and online advertising. We derive a novel O(n^{2/3}) dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is larger than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that O(n^{2/3}) is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and also provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free O(\sqrt{n}) regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

Author Information

Botao Hao (Deepmind)
Tor Lattimore (DeepMind)
Mengdi Wang (Princeton University)

Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years).

More from the Same Authors