Timezone: »
We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach we develop a generalization of the Max-Path search algorithm, which allows us to efficiently search over a structured space of multiple spatio-temporal paths, while also allowing to incorporate context information into the model. Instead of using spatial annotations, in the form of bounding boxes, to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, we show how our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.
Author Information
Nataliya Shapovalova (Simon Fraser University)
Michalis Raptis (Comcast Labs)
Leonid Sigal (University of British Columbia)
Greg Mori (Borealis AI)
More from the Same Authors
-
2021 : [O2] Not too close and not too far: enforcing monotonicity requires penalizing the right points »
Joao Monteiro · · Hossein Hajimirsadeghi · Greg Mori -
2021 Poster: Continuous Latent Process Flows »
Ruizhi Deng · Marcus Brubaker · Greg Mori · Andreas Lehrmann -
2020 Poster: Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows »
Ruizhi Deng · Bo Chang · Marcus Brubaker · Greg Mori · Andreas Lehrmann -
2018 Poster: Probabilistic Neural Programmed Networks for Scene Generation »
Zhiwei Deng · Jiacheng Chen · YIFANG FU · Greg Mori -
2018 Spotlight: Probabilistic Neural Programmed Networks for Scene Generation »
Zhiwei Deng · Jiacheng Chen · YIFANG FU · Greg Mori -
2017 Poster: Non-parametric Structured Output Networks »
Andreas Lehrmann · Leonid Sigal -
2017 Poster: Visual Reference Resolution using Attention Memory for Visual Dialog »
Paul Hongsuck Seo · Andreas Lehrmann · Bohyung Han · Leonid Sigal -
2014 Poster: A Unified Semantic Embedding: Relating Taxonomies and Attributes »
Sung Ju Hwang · Leonid Sigal -
2013 Poster: Latent Maximum Margin Clustering »
Guang-Tong Zhou · Tian Lan · Arash Vahdat · Greg Mori -
2012 Poster: Kernel Latent SVM for Visual Recognition »
Weilong Yang · Yang Wang · Arash Vahdat · Greg Mori -
2011 Poster: Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines »
Matthew D Zeiler · Graham Taylor · Leonid Sigal · Iain Matthews · Rob Fergus -
2010 Poster: A Discriminative Latent Model of Image Region and Object Tag Correspondence »
Yang Wang · Greg Mori -
2010 Poster: Beyond Actions: Discriminative Models for Contextual Group Activities »
Tian Lan · Yang Wang · Weilong Yang · Greg Mori -
2009 Poster: A Rate Distortion Approach for Semi-Supervised Conditional Random Fields »
Yang Wang · Gholamreza Haffari · Shaojun Wang · Greg Mori -
2008 Poster: Learning a discriminative hidden part model for human action recognition »
Yang Wang · Greg Mori -
2008 Spotlight: Learning a discriminative hidden part model for human action recognition »
Yang Wang · Greg Mori