Timezone: »
We propose a learning based, end-to-end motion capture model for monocular videos in the wild. Current state of the art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end trainable framework. Empirically we show our model combines the best of both worlds of supervised learning and test time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit that a pretrained fixed model. We show that the proposed model improves with experience and converges to low error solutions where previous optimization methods fail.
Author Information
Hsiao-Yu Tung (Carnegie Mellon University)
Hsiao-Wei Tung (University of Pittsburgh)
Ersin Yumer (Uber ATG)
Katerina Fragkiadaki (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Self-supervised Learning of Motion Capture »
Thu. Dec 7th 02:30 -- 06:30 AM Room Pacific Ballroom #40
More from the Same Authors
-
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2023 Poster: Brain Dissection: fMRI-trained Networks Reveal Spatial Selectivity in the Processing of Natural Images »
Gabriel Sarch · Michael Tarr · Leila Wehbe · Katerina Fragkiadaki -
2023 Poster: Test time Adaptation with Diffusion Models »
Mihir Prabhudesai · Tsung-Wei Ke · Alex Li · Deepak Pathak · Katerina Fragkiadaki -
2021 : 3D-OES: Viewpoint-Invariant Object-FactorizedEnvironment Simulators »
Hsiao-Yu Tung · Zhou Xian · Mihir Prabhudesai · Katerina Fragkiadaki -
2021 Workshop: Physical Reasoning and Inductive Biases for the Real World »
Krishna Murthy Jatavallabhula · Rika Antonova · Kevin Smith · Hsiao-Yu Tung · Florian Shkurti · Jeannette Bohg · Josh Tenenbaum -
2020 : QA: Katerina Fragkiadaki »
Katerina Fragkiadaki -
2020 : Invited Talk: Katerina Fragkiadaki »
Katerina Fragkiadaki -
2020 Session: Orals & Spotlights Track 05: Clustering/Ranking »
Silvio Lattanzi · Katerina Fragkiadaki -
2018 Poster: Geometry-Aware Recurrent Neural Networks for Active Visual Recognition »
Ricson Cheng · Ziyan Wang · Katerina Fragkiadaki -
2014 Poster: Spectral Methods for Indian Buffet Process Inference »
Hsiao-Yu Tung · Alexander Smola