Timezone: »

Reformulating Zero-shot Action Recognition for Multi-label Actions
Alec Kerrigan · Kevin Duarte · Yogesh Rawat · Mubarak Shah

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

The goal of zero-shot action recognition (ZSAR) is to classify action classes which were not previously seen during training. Traditionally, this is achieved by training a network to map, or regress, visual inputs to a semantic space where a nearest neighbor classifier is used to select the closest target class. We argue that this approach is sub-optimal due to the use of nearest neighbor on static semantic space and is ineffective when faced with multi-label videos - where two semantically distinct co-occurring action categories cannot be predicted with high confidence. To overcome these limitations, we propose a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function. Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently. This allows for the prediction of several semantically distinct classes within one video input. Our evaluations show that our method not only achieves strong performance on three single-label action classification datasets (UCF-101, HMDB, and RareAct), but also outperforms previous ZSAR approaches on a challenging multi-label dataset (AVA) and a real-world surprise activity detection dataset (MEVA).

Author Information

Alec Kerrigan (University of Central Florida)
Kevin Duarte (University of Central Florida)
Yogesh Rawat (University of Central Florida)
Mubarak Shah (University of Central Florida)

More from the Same Authors