Timezone: »
Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continued to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding now shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions \emph{a la} Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs. Our \model architecture achieves 82.7 \% Top-1 accuracy on ImageNet using 22M parameters, +2.8 improvement compared with DeiT-Small, and can converge to 79.9 \% Top-1 accuracy in just 200 epochs. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named \modellight, can be employed in object detection and instance segmentation networks such as DETR, RetinaNet and Mask-RCNN to obtain an impressive detection mAP of 38.9, 43.8, 45.1 and mask mAP of 41.3, providing large improvements of 6.6, 7.3, 6.9 and 6.6 pts respectively, compared to a ResNet-50 backbone with a comparable compute and parameter size. Our method also achieves promising results on self-supervised learning compared to DeiT on the DINO framework. Code is released at https://github.com/allenai/container.
Author Information
peng gao (allen institute for artificial intelligence ai2)
Jiasen Lu (Allen Institute of Artificial Intelligence)
Hongsheng Li (cuhk)
Roozbeh Mottaghi (Allen Institute for Artificial Intelligence)
Aniruddha Kembhavi (Allen Institute for Artificial Intelligence (AI2))
More from the Same Authors
-
2022 Spotlight: Lightning Talks 4B-3 »
Zicheng Zhang · Mancheng Meng · Antoine Guedon · Yue Wu · Wei Mao · Zaiyu Huang · Peihao Chen · Shizhe Chen · Yongwei Chen · Keqiang Sun · Yi Zhu · chen rui · Hanhui Li · Dongyu Ji · Ziyan Wu · miaomiao Liu · Pascal Monasse · Yu Deng · Shangzhe Wu · Pierre-Louis Guhur · Jiaolong Yang · Kunyang Lin · Makarand Tapaswi · Zhaoyang Huang · Terrence Chen · Jiabao Lei · Jianzhuang Liu · Vincent Lepetit · Zhenyu Xie · Richard I Hartley · Dinggang Shen · Xiaodan Liang · Runhao Zeng · Cordelia Schmid · Michael Kampffmeyer · Mathieu Salzmann · Ning Zhang · Fangyun Wei · Yabin Zhang · Fan Yang · Qifeng Chen · Wei Ke · Quan Wang · Thomas Li · qingling Cai · Kui Jia · Ivan Laptev · Mingkui Tan · Xin Tong · Hongsheng Li · Xiaodan Liang · Chuang Gan -
2022 Spotlight: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Spotlight: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2022 Spotlight: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Spotlight: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training »
Renrui Zhang · Ziyu Guo · Peng Gao · Rongyao Fang · Bin Zhao · Dong Wang · Yu Qiao · Hongsheng Li -
2022 Poster: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Poster: 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation »
Matt Deitke · Eli VanderBilt · Alvaro Herrasti · Luca Weihs · Kiana Ehsani · Jordi Salvador · Winson Han · Eric Kolve · Aniruddha Kembhavi · Roozbeh Mottaghi -
2022 Poster: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Poster: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Poster: Ask4Help: Learning to Leverage an Expert for Embodied Tasks »
Kunal Pratap Singh · Luca Weihs · Alvaro Herrasti · Jonghyun Choi · Aniruddha Kembhavi · Roozbeh Mottaghi -
2022 Poster: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2021 Poster: Bridging the Imitation Gap by Adaptive Insubordination »
Luca Weihs · Unnat Jain · Iou-Jen Liu · Jordi Salvador · Svetlana Lazebnik · Aniruddha Kembhavi · Alex Schwing -
2021 Poster: DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks »
Wei Sun · Aojun Zhou · Sander Stuijk · Rob Wijnhoven · Andrew Nelson · Hongsheng Li · Henk Corporaal -
2021 Poster: Dual-stream Network for Visual Recognition »
Mingyuan Mao · peng gao · Renrui Zhang · Honghui Zheng · Teli Ma · Yan Peng · Errui Ding · Baochang Zhang · Shumin Han -
2020 Poster: Supermasks in Superposition »
Mitchell Wortsman · Vivek Ramanujan · Rosanne Liu · Aniruddha Kembhavi · Mohammad Rastegari · Jason Yosinski · Ali Farhadi -
2020 Poster: Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID »
Yixiao Ge · Feng Zhu · Dapeng Chen · Rui Zhao · Hongsheng Li -
2020 Poster: Learning About Objects by Learning to Interact with Them »
Martin Lohmann · Jordi Salvador · Aniruddha Kembhavi · Roozbeh Mottaghi -
2020 Poster: Balanced Meta-Softmax for Long-Tailed Visual Recognition »
Jiawei Ren · Cunjun Yu · shunan sheng · Xiao Ma · Haiyu Zhao · Shuai Yi · Hongsheng Li -
2020 Poster: Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data »
Michael Cogswell · Jiasen Lu · Rishabh Jain · Stefan Lee · Devi Parikh · Dhruv Batra -
2019 Poster: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks »
Jiasen Lu · Dhruv Batra · Devi Parikh · Stefan Lee -
2019 Poster: Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis »
Xihui Liu · Guojun Yin · Jing Shao · Xiaogang Wang · Hongsheng Li -
2018 : Panel Discussion »
Antonio Torralba · Douwe Kiela · Barbara Landau · Angeliki Lazaridou · Joyce Chai · Christopher Manning · Stevan Harnad · Roozbeh Mottaghi -
2018 : Roozbehm Mottaghi - Interactive Scene Understanding »
Roozbeh Mottaghi -
2018 Poster: FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification »
Yixiao Ge · Zhuowan Li · Haiyu Zhao · Guojun Yin · Shuai Yi · Xiaogang Wang · Hongsheng Li -
2017 : Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Mode »
Jiasen Lu -
2017 Poster: Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model »
Jiasen Lu · Anitha Kannan · Jianwei Yang · Devi Parikh · Dhruv Batra -
2016 Poster: CRF-CNN: Modeling Structured Information in Human Pose Estimation »
Xiao Chu · Wanli Ouyang · Hongsheng Li · Xiaogang Wang