Timezone: »

 
Poster
Online Algorithm for Unsupervised Sequential Selection with Contextual Information
Arun Verma · Manjesh Kumar Hanawal · Csaba Szepesvari · Venkatesh Saligrama

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #243

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.

Author Information

Arun Verma (Indian Institute of Technology Bombay)

PhD Student at IIT Bombay

Manjesh Kumar Hanawal (IIT Bombay)

Manjesh K. Hanawal received the M.S. degree in ECE from the Indian Institute of Science, Bengaluru, India, in 2009, and the Ph.D. degree from INRIA, Sophia Antipolis, France, and the University of Avignon, Avignon, France, in 2013. He was a Scientist-B with the Center for Artificial Intelligence and Robotics, DRDO, Bengaluru, India. He was a Post-Doctoral Fellow with Boston University for two years. He is currently an Assistant Professor in industrial engineering and operations research with the Indian Institute of Technology Bombay, Mumbai, India. His research interests include performance evaluation, machine learning, and network economics. He is a recipient of the Inspire Faculty Award from DST and the Early Career Research Award from SERB, Govt. of India.

Csaba Szepesvari (DeepMind / University of Alberta)
Venkatesh Saligrama (Boston University)

More from the Same Authors