Timezone: »

 
Time-Evolving Conditional Character-centric Graphs for Movie Understanding
Long Dang · Thao Le · Vuong Le · Tu Minh Phuong · Truyen Tran
Event URL: https://openreview.net/forum?id=NXnSr_uXgh »

Temporal graph structure learning for long-term human-centric video understanding is promising but remains challenging due to the scarcity of dense graph annotations for long videos. It is the desired capability to learn the dynamic spatio-temporal interactions of human actors and other objects implicitly from visual information itself. Toward this goal, we present a novel Time-Evolving Conditional cHaracter-centric graph (TECH) for long-term human-centric video understanding with application in Movie QA. TECH is inherently a recurrent system of the query-conditioned dynamic graph that evolves over time along the story and follows throughout the course of a movie clip. As aiming toward human-centric video understanding, TECH uses a two-stage feature refinement process to draw attention to human characters and their interactions while treating the interactions with non-human objects as contextual information. Tested on the large-scale TVQA dataset, TECH clearly shows advantages over recent state-of-the-art models.

Author Information

Long Dang (Deakin University)
Thao Le (Deakin University)
Vuong Le (Deakin University)
Tu Minh Phuong (Posts and Telecommunications Institute of Technology, Ha Noi)
Tu Minh Phuong

Tu Minh Phuong is Professor of Computer Science at Posts and Telecommunications Institute of Technology, Ha Noi, Vietnam. His current research interest is machine learning, especially deep learning, with applications in recommender systems, NLP, and computer vision.

Truyen Tran (Deakin University)

More from the Same Authors