Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Gaze Meets ML

Temporal Understanding of Gaze Communication with GazeTransformer

Ryan Anthony de Belen · Gelareh Mohammadi · Arcot Sowmya

Keywords: [ Gaze estimation and prediction ] [ gaze communication behaviour prediction ]

[ ] [ Project Page ]
Sat 16 Dec 9:45 a.m. PST — 11:30 a.m. PST

Abstract:

Gaze plays a crucial role in daily social interactions as it allows humans to communicate intentions effectively. We address the problem of temporal understanding of gaze communication in social videos in two stages. First, we develop GazeTransformer, an end-to-end module that infers atomic-level behaviours in a given frame. Second, we develop a temporal module that predicts event-level behaviours in a video using the inferred atomic-level behaviours. Compared to existing methods, GazeTransformer does not require human head and object locations as input. Instead, it identifies these locations in a parallel and end-to-end manner. In addition, it can predict the attended targets of all predicted humans and infer more atomic-level behaviours that cannot be handled simultaneously by previous approaches. We achieve promising performance on both atomic- and event-level prediction on the (M)VACATION dataset. Code will be available at https://github.com/gazetransformer/gazetransformer.

Chat is not available.