Timezone: »
This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem.In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy.This makes BLBF algorithms particularly attractive for training online systems (e.g., ad placement, web search, recommendation) using their historical logs.The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms. It requires a counterfactual risk estimator, and virtually all existing works on BLBF have focused on a particular unbiased estimator.We show that this conventional estimator suffers from apropensity overfitting problem when used for learning over complex hypothesis spaces.We propose to replace the risk estimator with a self-normalized estimator, showing that it neatly avoids this problem.This naturally gives rise to a new learning algorithm -- Normalized Policy Optimizer for Exponential Models (Norm-POEM) --for structured output prediction using linear rules.We evaluate the empirical effectiveness of Norm-POEM on severalmulti-label classification problems, finding that it consistently outperforms the conventional estimator.
Author Information
Adith Swaminathan (Cornell University)
Thorsten Joachims (Cornell)
More from the Same Authors
-
2021 Poster: Heuristic-Guided Reinforcement Learning »
Ching-An Cheng · Andrey Kolobov · Adith Swaminathan -
2021 Poster: Fairness in Ranking under Uncertainty »
Ashudeep Singh · David Kempe · Thorsten Joachims -
2020 Poster: MOReL: Model-Based Offline Reinforcement Learning »
Rahul Kidambi · Aravind Rajeswaran · Praneeth Netrapalli · Thorsten Joachims -
2019 : Opening Remarks »
Thorsten Joachims · Nathan Kallus · Michele Santacatterina · Adith Swaminathan · David Sontag · Angela Zhou -
2019 Workshop: Machine Learning with Guarantees »
Ben London · Gintare Karolina Dziugaite · Daniel Roy · Thorsten Joachims · Aleksander Madry · John Shawe-Taylor -
2019 Workshop: “Do the right thing”: machine learning and causal inference for improved decision making »
Michele Santacatterina · Thorsten Joachims · Nathan Kallus · Adith Swaminathan · David Sontag · Angela Zhou -
2019 : Thorsten Joachim: Fair Ranking with Biased Data »
Thorsten Joachims -
2019 Poster: Policy Learning for Fairness in Ranking »
Ashudeep Singh · Thorsten Joachims -
2017 : Equality of Opportunity in Rankings »
Thorsten Joachims · Ashudeep Singh -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan -
2016 : Panel Discussion »
Gisbert Schneider · Ross E Goodwin · Simon Colton · Russ Salakhutdinov · Thorsten Joachims · Florian Pinel -
2016 : Structured Prediction with Logged Bandit Feedback »
Thorsten Joachims -
2016 Workshop: "What If?" Inference and Learning of Hypothetical and Counterfactual Interventions in Complex Systems »
Ricardo Silva · John Shawe-Taylor · Adith Swaminathan · Thorsten Joachims -
2015 Spotlight: The Self-Normalized Estimator for Counterfactual Learning »
Adith Swaminathan · Thorsten Joachims -
2013 Poster: Learning Trajectory Preferences for Manipulators via Iterative Improvement »
Ashesh Jain · Brian Wojcik · Thorsten Joachims · Ashutosh Saxena -
2011 Poster: Semantic Labeling of 3D Point Clouds for Indoor Scenes »
Hema Koppula · Abhishek Anand · Thorsten Joachims · Ashutosh Saxena -
2007 Workshop: Machine Learning for Web Search »
Denny Zhou · Olivier Chapelle · Thorsten Joachims · Thomas Hofmann