Timezone: »

Learning from Logged Implicit Exploration Data
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade

Wed Dec 08 12:00 AM -- 12:00 AM (PST) @

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in contextual bandit'' orpartially labeled'' settings where only the value of a chosen action is learned.

The primary challenge in a variety of settings is that the exploration policy, in which ``offline'' data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged.

We empirically verify our solution on two reasonably sized sets of real-world data obtained from an Internet %online advertising company.

Author Information

Alexander L Strehl (Facebook)
John Langford
Lihong Li (Amazon)
Sham M Kakade (Harvard University & Amazon)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors