Timezone: »
A latent bandit is a bandit problem where the learning agent knows reward distributions of arms conditioned on an unknown discrete latent state. The goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning, where complex models can be learned offline and the agent identifies the latent state online. This is of high practical relevance, for instance in recommender systems. In this work, we propose general algorithms for latent bandits, based on both upper confidence bounds and Thompson sampling. The algorithms are contextual, and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach.
Author Information
Joey Hong (Google Research)
Branislav Kveton (Google Research)
Manzil Zaheer (Google)
Yinlam Chow (Google Research)
Amr Ahmed (Google Research)
Amr Ahmed is a Senior Staff Research Scientist at Google. He received his M.Sc and PhD degrees from the School of Computer Science, Carnegie Mellon University in 2009 and 2011, respectively. He received the best paper award at KDD 2014 , the best Paper Award at WSDM 2014, the 2012 ACM SIGKDD Doctoral Dissertation Award, and a best paper award (runner-up) at WSDM 2012. He co-chaired the WWW'18 track on Web Content Analysis and served as an Area Chair for IJCAI 2019, SIGIR 2019, SIGIR 2018, ICML 2018, ICML 2017, KDD 2016, WSDM 2015, ICML 2014, and ICDM 2014. His research interests include large-scale machine learning, data/web mining, user modeling, personalization, social networks and content analysis.
Craig Boutilier (Google)
More from the Same Authors
-
2021 Spotlight: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2021 : Should I Run Offline Reinforcement Learning or Behavioral Cloning? »
Aviral Kumar · Joey Hong · Anikait Singh · Sergey Levine -
2022 : Differentially Private Adaptive Optimization with Delayed Preconditioners »
Tian Li · Manzil Zaheer · Ken Liu · Sashank Reddi · H. Brendan McMahan · Virginia Smith -
2022 : Differentially Private Adaptive Optimization with Delayed Preconditioners »
Tian Li · Manzil Zaheer · Ken Liu · Sashank Reddi · H. Brendan McMahan · Virginia Smith -
2022 : A Mixture-of-Expert Approach to RL-based Dialogue Management »
Yinlam Chow · Azamat Tulepbergenov · Ofir Nachum · Dhawal Gupta · Moonkyung Ryu · Mohammad Ghavamzadeh · Craig Boutilier -
2022 Poster: A Fourier Approach to Mixture Learning »
Mingda Qiao · Guru Guruganesh · Ankit Rawat · Kumar Avinava Dubey · Manzil Zaheer -
2022 Poster: Learning to Navigate Wikipedia by Taking Random Walks »
Manzil Zaheer · Kenneth Marino · Will Grathwohl · John Schultz · Wendy Shang · Sheila Babayan · Arun Ahuja · Ishita Dasgupta · Christine Kaeser-Chen · Rob Fergus -
2022 Poster: Efficient Risk-Averse Reinforcement Learning »
Ido Greenberg · Yinlam Chow · Mohammad Ghavamzadeh · Shie Mannor -
2021 Poster: No Regrets for Learning the Prior in Bandits »
Soumya Basu · Branislav Kveton · Manzil Zaheer · Csaba Szepesvari -
2021 Poster: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2020 Poster: PLLay: Efficient Topological Layer based on Persistent Landscapes »
Kwangho Kim · Jisu Kim · Manzil Zaheer · Joon Kim · Frederic Chazal · Larry Wasserman -
2020 Poster: Differentiable Meta-Learning of Bandit Policies »
Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer -
2020 Poster: Robust large-margin learning in hyperbolic space »
Melanie Weber · Manzil Zaheer · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Big Bird: Transformers for Longer Sequences »
Manzil Zaheer · Guru Guruganesh · Kumar Avinava Dubey · Joshua Ainslie · Chris Alberti · Santiago Ontanon · Philip Pham · Anirudh Ravula · Qifan Wang · Li Yang · Amr Ahmed -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Coffee Break & Poster Session 1 »
Yan Zhang · Jonathon Hare · Adam Prugel-Bennett · Po Leung · Patrick Flaherty · Pitchaya Wiratchotisatian · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam · Theja Tulabandhula · Fabian Fuchs · Adam Kosiorek · Ingmar Posner · William Hang · Anna Goldie · Sujith Ravi · Azalia Mirhoseini · Yuwen Xiong · Mengye Ren · Renjie Liao · Raquel Urtasun · Haici Zhang · Michele Borassi · Shengda Luo · Andrew Trapp · Geoffroy Dubourg-Felonneau · Yasmeen Kussad · Christopher Bender · Manzil Zaheer · Junier Oliva · Michał Stypułkowski · Maciej Zieba · Austin Dill · Chun-Liang Li · Songwei Ge · Eunsu Kang · Oiwi Parker Jones · Kelvin Ka Wing Wong · Joshua Payne · Yang Li · Azade Nazi · Erkut Erdem · Aykut Erdem · Kevin O'Connor · Juan J Garcia · Maciej Zamorski · Jan Chorowski · Deeksha Sinha · Harry Clifford · John W Cassidy -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 : Opening Remarks »
Manzil Zaheer · Nicholas Monath · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov · Andrew McCallum -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: Nonparametric Density Estimation under Adversarial Losses »
Shashank Singh · Ananya Uppal · Boyue Li · Chun-Liang Li · Manzil Zaheer · Barnabas Poczos -
2018 Oral: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: A Lyapunov-based Approach to Safe Reinforcement Learning »
Yinlam Chow · Ofir Nachum · Edgar Duenez-Guzman · Mohammad Ghavamzadeh -
2018 Poster: Adaptive Methods for Nonconvex Optimization »
Manzil Zaheer · Sashank Reddi · Devendra S Sachan · Satyen Kale · Sanjiv Kumar -
2018 Poster: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization »
Tengyang Xie · Bo Liu · Yangyang Xu · Mohammad Ghavamzadeh · Yinlam Chow · Daoming Lyu · Daesub Yoon -
2018 Poster: Data center cooling using model-predictive control »
Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback »
Zheng Wen · Branislav Kveton · Michal Valko · Sharan Vaswani -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2015 Poster: Efficient Thompson Sampling for Online Matrix-Factorization Recommendation »
Jaya Kawale · Hung H Bui · Branislav Kveton · Long Tran-Thanh · Sanjay Chawla -
2015 Poster: Combinatorial Cascading Bandits »
Branislav Kveton · Zheng Wen · Azin Ashkan · Csaba Szepesvari