Timezone: »

Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits
Yifei Wang · Tavor Baharav · Yanjun Han · Jiantao Jiao · David Tse

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #724

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on the best arm, i.e. estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum and obtain optimal sample complexities in both offline and online settings. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. We propose unified meta algorithms for the online and offline settings and derive matching lower bounds using different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

Author Information

Yifei Wang (Stanford University)
Tavor Baharav (Stanford University)

I am a second year PhD student in Electrical Engineering at Stanford University working with Professor David Tse, recently on developing fast (near linear time) randomized algorithms using techniques from multi-armed bandits. I am grateful to be supported by the NSF Graduate Research Fellowship and the Stanford Graduate Fellowship (SGF). I graduated from UC Berkeley in May 2018 where I studied Electrical Engineering and Computer Science. In my time there, I was fortunate to get the chance to work with Professor Kannan Ramchandran on coding theory and its applications to distributed computing. My current research focus is on constructing algorithms that adapt to problem instance difficulty, and more broadly in randomized algorithms, machine learning, multi-armed bandits, and their applications in engineering and computational biology problems.

Yanjun Han (Massachusetts Institute of Technology)
Jiantao Jiao (University of California, Berkeley)
David Tse (Stanford University)

More from the Same Authors