Timezone: »
Poster
Combinatorial Multi-Armed Bandit with General Reward Functions
Wei Chen · Wei Hu · Fu Li · Jian Li · Yu Liu · Pinyan Lu
In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the $\max()$ function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve $O(\log T)$ distribution-dependent regret and $\tilde{O}(\sqrt{T})$ distribution-independent regret, where $T$ is the time horizon. We apply our results to the $K$-MAX problem and expected utility maximization problems. In particular, for $K$-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first $\tilde{O}(\sqrt T)$ bound on the $(1-\epsilon)$-approximation regret of its online problem, for any $\epsilon>0$.
Author Information
Wei Chen (Microsoft Research)
Wei Hu (Princeton University)
Fu Li (The University of Texas at Austin)
Jian Li (Tsinghua University)
Yu Liu (Tsinghua University)
Pinyan Lu (Shanghai University of Finance and Economics)
More from the Same Authors
-
2022 Poster: Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability »
Zixuan Wang · Zhouzi Li · Jian Li -
2022 Poster: Generalization Bounds for Gradient Methods via Discrete and Continuous Prior »
Xuanyuan Luo · Bei Luo · Jian Li -
2022 Poster: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2022 Spotlight: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-1 »
Jiawei Huang · Su Jia · Abdurakhmon Sadiev · Ruomin Huang · Yuanyu Wan · Denizalp Goktas · Jiechao Guan · Andrew Li · Wei-Wei Tu · Li Zhao · Amy Greenwald · Jiawei Huang · Dmitry Kovalev · Yong Liu · Wenjie Liu · Peter Richtarik · Lijun Zhang · Zhiwu Lu · R Ravi · Tao Qin · Wei Chen · Hu Ding · Nan Jiang · Tie-Yan Liu -
2022 Poster: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms »
Xutong Liu · Jinhang Zuo · Siwei Wang · Carlee Joe-Wong · John C.S. Lui · Wei Chen -
2022 Poster: Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation »
Jin Xu · Xiaojiang Liu · Jianhao Yan · Deng Cai · Huayang Li · Jian Li -
2021 Poster: Combinatorial Pure Exploration with Bottleneck Reward Function »
Yihan Du · Yuko Kuroki · Wei Chen -
2021 Poster: The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle »
Fang Kong · Yueran Yang · Wei Chen · Shuai Li -
2020 Poster: Online Influence Maximization under Linear Threshold Model »
Shuai Li · Fang Kong · Kejie Tang · Qizhi Li · Wei Chen -
2019 Poster: Adaptive Influence Maximization with Myopic Feedback »
Binghui Peng · Wei Chen -
2018 Poster: Community Exploration: From Offline Optimization to Online Learning »
Xiaowei Chen · Weiran Huang · Wei Chen · John C. S. Lui -
2018 Poster: BRITS: Bidirectional Recurrent Imputation for Time Series »
Wei Cao · Dong Wang · Jian Li · Hao Zhou · Lei Li · Yitan Li -
2017 Poster: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications »
Qinshi Wang · Wei Chen -
2017 Poster: Influence Maximization with $\varepsilon$-Almost Submodular Threshold Functions »
Qiang Li · Wei Chen · Institute of Computing Xiaoming Sun · Institute of Computing Jialin Zhang -
2015 Poster: Stochastic Online Greedy Learning with Semi-bandit Feedbacks »
Tian Lin · Jian Li · Wei Chen -
2014 Poster: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen -
2014 Oral: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen