Poster
Sequential Experimental Design for Transductive Linear Bandits
Lalit Jain · Kevin Jamieson · Tanner Fiez · Lillian Ratliff
Wed Dec 11th 05:00  07:00 PM @ East Exhibition Hall B + C #10
In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\arg\max_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to known bestsellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instancedependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we present the first nonasymptotic algorithm for linear bandits that nearly achieves the informationtheoretic lower bound.
Author Information
Lalit Jain (University of Washington)
Kevin Jamieson (U Washington)
Tanner Fiez (University of Washington)
Lillian Ratliff (University of Washington)
More from the Same Authors

2019 Poster: A New Perspective on PoolBased Active Classification and FalseDiscovery Control »
Lalit Jain · Kevin Jamieson 
2019 Poster: NonAsymptotic GapDependent Regret Bounds for Tabular MDPs »
Max Simchowitz · Kevin Jamieson 
2018 Poster: A Bandit Approach to Sequential Experimental Design with False Discovery Control »
Kevin Jamieson · Lalit Jain