Timezone: »
The goal of zero-shot action recognition (ZSAR) is to classify action classes which were not previously seen during training. Traditionally, this is achieved by training a network to map, or regress, visual inputs to a semantic space where a nearest neighbor classifier is used to select the closest target class. We argue that this approach is sub-optimal due to the use of nearest neighbor on static semantic space and is ineffective when faced with multi-label videos - where two semantically distinct co-occurring action categories cannot be predicted with high confidence. To overcome these limitations, we propose a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function. Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently. This allows for the prediction of several semantically distinct classes within one video input. Our evaluations show that our method not only achieves strong performance on three single-label action classification datasets (UCF-101, HMDB, and RareAct), but also outperforms previous ZSAR approaches on a challenging multi-label dataset (AVA) and a real-world surprise activity detection dataset (MEVA).
Author Information
Alec Kerrigan (University of Central Florida)
Kevin Duarte (University of Central Florida)
Yogesh Rawat (University of Central Florida)
Mubarak Shah (University of Central Florida)
More from the Same Authors
-
2022 : Contrastive Learning on Synthetic Videos for GAN Latent Disentangling »
Kevin Duarte · Wei-An Lin · Ratheesh Kalarot · Jingwan (Cynthia) Lu · Eli Shechtman · Shabnam Ghadar · Mubarak Shah -
2022 Poster: Are all Frames Equal? Active Sparse Labeling for Video Action Detection »
Aayush Rana · Yogesh Rawat -
2022 Poster: Robustness Analysis of Video-Language Models Against Visual and Language Perturbations »
Madeline Chantry · Shruti Vyas · Hamid Palangi · Yogesh Rawat · Vibhav Vineet -
2022 Poster: Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation »
Ziwei Xu · Yogesh Rawat · Yongkang Wong · Mohan Kankanhalli · Mubarak Shah -
2019 Poster: Unsupervised Meta-Learning for Few-Shot Image Classification »
Siavash Khodadadeh · Ladislau Boloni · Mubarak Shah -
2018 Poster: VideoCapsuleNet: A Simplified Network for Action Detection »
Kevin Duarte · Yogesh Rawat · Mubarak Shah