Timezone: »

Unlocking Slot Attention by Changing Optimal Transport Costs
Yan Zhang · David Zhang · Simon Lacoste-Julien · Gertjan Burghouts · Cees Snoek

Slot attention is a successful method for object-centric modeling with images and videos for tasks like unsupervised object discovery. However, set-equivariance limits its ability to perform tiebreaking, which makes distinguishing similar structures difficult – a task crucial for vision problems. To fix this, we cast cross-attention in slot attention as an optimal transport (OT) problem that has solutions with the desired tiebreaking properties. We then propose an entropy minimization module that combines the tiebreaking properties of unregularized OT with the speed of regularized OT. We evaluate our method on CLEVR object detection and observe significant improvements from 53% to 91% on a strict average precision metric.

Author Information

Yan Zhang (Samsung - SAIT AI Lab Montreal)
David Zhang (University of Amsterdam)
Simon Lacoste-Julien (Mila, Université de Montréal & SAIL Montreal)

Simon Lacoste-Julien is an associate professor at Mila and DIRO from Université de Montréal, and Canada CIFAR AI Chair holder. He also heads part time the SAIT AI Lab Montreal from Samsung. His research interests are machine learning and applied math, with applications in related fields like computer vision and natural language processing. He obtained a B.Sc. in math., physics and computer science from McGill, a PhD in computer science from UC Berkeley and a post-doc from the University of Cambridge. He spent a few years as a research faculty at INRIA and École normale supérieure in Paris before coming back to his roots in Montreal in 2016 to answer the call from Yoshua Bengio in growing the Montreal AI ecosystem.

Gertjan Burghouts (TNO - Intelligent Imaging)
Cees Snoek (University of Amsterdam)

More from the Same Authors