Skip to yearly menu bar Skip to main content


Bandit Social Learning under Myopic Behavior

Kiarash Banihashem · MohammadTaghi Hajiaghayi · Suho Shin · Aleksandrs Slivkins

Great Hall & Hall B1+B2 (level 1) #1827
[ ]
[ Paper [ Poster [ OpenReview
Wed 13 Dec 3 p.m. PST — 5 p.m. PST


We study social learning dynamics motivated by reviews on online platforms. Theagents collectively follow a simple multi-armed bandit protocol, but each agentacts myopically, without regards to exploration. We allow a wide range of myopicbehaviors that are consistent with (parameterized) confidence intervals for the arms’expected rewards. We derive stark exploration failures for any such behavior, andprovide matching positive results. As a special case, we obtain the first generalresults on failure of the greedy algorithm in bandits, thus providing a theoreticalfoundation for why bandit algorithms should explore.

Chat is not available.