`

Timezone: »

 
Poster
MOMA: Multi-Object Multi-Actor Activity Parsing
Zelun Luo · Wanze Xie · Siddharth Kapoor · Yiyun Liang · Michael Cooper · Juan Carlos Niebles · Ehsan Adeli · Fei-Fei Li

Tue Dec 07 04:30 PM -- 06:00 PM (PST) @

Complex activities often involve multiple humans utilizing different objects to complete actions (e.g., in healthcare settings, physicians, nurses, and patients interact with each other and various medical devices). Recognizing activities poses a challenge that requires a detailed understanding of actors' roles, objects' affordances, and their associated relationships. Furthermore, these purposeful activities are composed of multiple achievable steps, including sub-activities and atomic actions, which jointly define a hierarchy of action parts. This paper introduces Activity Parsing as the overarching task of temporal segmentation and classification of activities, sub-activities, atomic actions, along with an instance-level understanding of actors, objects, and their relationships in videos. Involving multiple entities (actors and objects), we argue that traditional pair-wise relationships, often used in scene or action graphs, do not appropriately represent the dynamics between them. Hence, we introduce Action Hypergraph, a spatial-temporal graph containing hyperedges (i.e., edges with higher-order relationships), as a new representation. In addition, we introduce Multi-Object Multi-Actor (MOMA), the first benchmark and dataset dedicated to activity parsing. Lastly, to parse a video, we propose the HyperGraph Activity Parsing (HGAP) network, which outperforms several baselines, including those based on regular graphs and raw video data.

Author Information

Zelun Luo (Stanford University)
Wanze Xie (Stanford University)
Siddharth Kapoor (Stanford University)
Yiyun Liang
Michael Cooper (University of Toronto)
Juan Carlos Niebles (Stanford University)
Ehsan Adeli (Stanford University)
Fei-Fei Li (Princeton University)

More from the Same Authors