Skip to yearly menu bar Skip to main content


Oral Poster

Enhancing Preference-based Linear Bandits via Human Response Time

Shen Li · Yuyang Zhang · Claire Liang · Zhaolin Ren · Na Li · Julie A Shah

East Exhibit Hall A-C #4901
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST
 
Oral presentation: Oral Session 2A: Agents
Wed 11 Dec 3:30 p.m. PST — 4:30 p.m. PST

Abstract:

Binary human choice feedback is widely used in interactive preference learning for its simplicity but offers limited insights into preference strength. To address this, we leverage human response time, which inversely correlates with preference strength, as complementary information. Our work integrates the Drift-Diffusion model, which jointly models human choices and response times, into preference-based linear bandits. We introduce a computationally efficient utility estimator that transforms the utility estimation problem using choices and response times into a linear regression problem. Theoretical and empirical comparisons with traditional choice-only estimators reveal that for queries with strong preferences ("easy" queries), choices alone provide limited information, while response times offer valuable additional insights about preference strength. Thus, incorporating response times makes easy queries more useful. We demonstrate this benefit in the fixed-budget best-arm identification problem. Simulations based on datasets of human choices and response times for snacks, fashion item clicks, and movie ratings consistently show accelerated learning when response times are incorporated.

Live content is unavailable. Log in and register to view live content