Timezone: »

Contextual Visual Feature Learning for Zero-Shot Recognition of Human-Object Interactions
Tsung-Wei Ke · Dong-Jin Kim · Stella Yu · Liang Gou · Liu Ren

Real-world visual recognition of an object involves not only its own semantics but also those surrounding it. Supervised learning of contextual relationships is restrictive and impractical with the combinatorial explosion of possible relationships among a group of objects. Our key insight is to formulate visual context not as a relationship classification problem, but as a representation learning problem, where objects located close in the feature space have similar visual contexts. Such a model is infinitely scalable with respect to the number of objects or their relationships.We develop a contextual visual feature learning model without any supervision on relationships. We characterize visual context in terms of spatial configuration of semantics between objects and their surrounds, and derive pixel-to-segment learning losses that capture visual similarity, semantic co-occurrences, and structural correlation. Visual context emerges in a completely data-driven fashion, with objects in similar contexts mapped to close points in the feature space. Most strikingly, when benchmarked on HICO for recognizing human-object interactions, our unsupervised model trained only on MSCOCO significantly outperforms the supervised baseline and approaches the supervised state-of-the-art, both trained specifically on HICO with annotated relationships!

Author Information

Tsung-Wei Ke (UC Berkeley)
Dong-Jin Kim (Hanyang University)
Stella Yu (UC Berkeley / ICSI)
Liang Gou (Bosch Research )
Liu Ren (BOSCH Research North America)

More from the Same Authors

  • 2022 : Modeling Semantic Correlation and Hierarchy for Real-world Wildlife Recognition »
    Dong-Jin Kim · Zhongqi Miao · Yunhui Guo · Stella Yu · Kyle Landolt · Mark Koneff · Travis Harrison
  • 2022 : Multi-band Image Classification with Ultra-Lean Complex-Valued Models »
    Utkarsh Singhal · Stella Yu · Zackery Steck · Scott Kangas
  • 2021 Poster: The Emergence of Objectness: Learning Zero-shot Segmentation from Videos »
    Runtao Liu · Zhirong Wu · Stella Yu · Stephen Lin
  • 2019 : Poster Session »
    Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · SĂ©bastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie