Timezone: »
Convolutional Neural Networks (CNNs) have been the dominant model for video action recognition. Due to the huge memory and compute demand, popular action recognition networks need to be trained with small batch sizes, which makes learning discriminative spatial-temporal representations for videos become a challenging problem. In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings. We observe that state-of-the-art action recognition networks usually apply the same normalization parameters to all video data, and ignore the dependencies of the estimated normalization parameters between neighboring frames (at the same layer) and between neighboring layers (with all frames of a video clip). Inspired by this, DNR introduces two dynamic normalization relay modules to explore the potentials of cross-temporal and cross-layer feature distribution dependencies for estimating accurate layer-wise normalization parameters. These two DNR modules are instantiated as a light-weight recurrent structure conditioned on the current input features, and the normalization parameters estimated from the neighboring frames based features at the same layer or from the whole video clip based features at the preceding layers. We first plug DNR into prevailing 2D CNN backbones and test its performance on public action recognition datasets including Kinetics and Something-Something. Experimental results show that DNR brings large performance improvements to the baselines, achieving over 4.4% absolute margins in top-1 accuracy without training bells and whistles. More experiments on 3D backbones and several latest 2D spatial-temporal networks further validate its effectiveness. Code will be available at https://github.com/caidonkey/dnr.
Author Information
Dongqi Cai (Intel Labs China)
Anbang Yao (Intel Labs China)
Yurong Chen (Intel Labs China)
Dr. Yurong Chen is a Principal Research Scientist and Sr. Research Director at Intel Corporation, and Director of Cognitive Computing Lab at Intel Labs China. Currently, he’s responsible for leading cutting-edge Visual Cognition and Machine Learning research for Intel smart computing and driving research innovation in smart visual data processing technologies on Intel platforms across Intel Labs. He drove the research and development of Deep Learning (DL) based Visual Understanding (VU) and leading Face Analysis technologies to impact Intel architectures/platforms and delivered core technologies to help differentiate Intel products including Intel Movidius VPU, RealSense SDK, CV SDK, OpenVINO, Unite, IOT video E2E analytics solutions and client apps. His team also delivered core AI technologies such as 3D face technology and tiger Re-ID for “Chris Lee World’s First AI Music Video”, “The Great Wall Restoration” and “Saving Amur Tigers” to promote Intel AI leadership. Meanwhile, his team won and achieved top rankings in many international visual challenges' tasks including image matching and multi-view reconstruction (CVPR 2019), adversarial vision (NeurIPS 2018/17), multimodal emotion recognition (ACM ICMI EmotiW 2017/16/15), object detection (MS COCO 2017), visual question answering (VQA 2017), video description (MSR-VTT 2016), etc. He led the team to win Intel China Award (Top team award of Intel China) 2016, Intel Labs Academic Awards (Top award of Intel labs) – Gordy Award 2016, 2015 and 2014 for outstanding research achievements on DL based VU, Multimodal Emotion Recognition and Advanced Visual Analytics. He has published over 60 technical papers (in CVPR, ICCV, ECCV, TPAMI, IJCV, NeurIPS, ICLR, IJCAI, IEEE Micro, etc.) and holds 50+ issued/pending US/PCT patents. Dr. Chen joined Intel in 2004 after finishing his postdoctoral research in the Institute of Software, CAS. He received his Ph.D. degree from Tsinghua University in 2002.
More from the Same Authors
-
2018 : Deep neural network compression and acceleration »
Anbang Yao -
2018 Workshop: NIPS 2018 workshop on Compact Deep Neural Networks with industrial applications »
Lixin Fan · Zhouchen Lin · Max Welling · Yurong Chen · Werner Bailer -
2018 Poster: Sparse DNNs with Improved Adversarial Robustness »
Yiwen Guo · Chao Zhang · Changshui Zhang · Yurong Chen -
2016 Poster: Dynamic Network Surgery for Efficient DNNs »
Yiwen Guo · Anbang Yao · Yurong Chen