Timezone: »
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perceptron (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to Segformer-B5, which reaches much better performance and efficiency than previous counterparts.For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C.
Author Information
Enze Xie (The University of Hong Kong)
I am a PhD student in Department of Computer Science, The University of Hong Kong (HKU) since 2019, supervised by Prof. Ping Luo and co-supervised by Prof. Wenping Wang. I obtained B.S. from Nanjing University of Aeronautics and Astronautics (2016) and M.S. from TongJi University (2019). From 2018 to present, I collaborated with several researchers in industry e.g. Face++(Megvii), SenseTime, Facebook, Huawei and NVIDIA. My research interest is computer vision in 2D and 3D. I did some works about instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including PolarMask, which was selected as CVPR 2020 Top-10 Influential Papers. I co-developed OpenSelfSup(1k+ star), a popular self-supervised learning framework. I am finding a full-time research job. Please contact me!
Wenhai Wang (Nanjing University)
Zhiding Yu (NVIDIA)
Anima Anandkumar (NVIDIA/Caltech)
Jose M. Alvarez (NVIDIA)
Ping Luo (The University of Hong Kong)
More from the Same Authors
-
2021 : An Empirical Investigation of Representation Learning for Imitation »
Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah -
2022 Poster: Structural Pruning via Latency-Saliency Knapsack »
Maying Shen · Hongxu Yin · Pavlo Molchanov · Lei Mao · Jianna Liu · Jose M. Alvarez -
2022 Poster: Optimizing Data Collection for Machine Learning »
Rafid Mahmood · James Lucas · Jose M. Alvarez · Sanja Fidler · Marc Law -
2022 : Calibration of Large Neural Weather Models »
Andre Graubner · Kamyar Azizzadenesheli · Jaideep Pathak · Morteza Mardani · Mike Pritchard · Karthik Kashinath · Anima Anandkumar -
2022 : FourCastNet: A practical introduction to a state-of-the-art deep learning global weather emulator »
Jaideep Pathak · Shashank Subramanian · Peter Harrington · Thorsten Kurth · Andre Graubner · Morteza Mardani · David Hall · Karthik Kashinath · Anima Anandkumar -
2022 : Robust Trajectory Prediction against Adversarial Attacks »
Yulong Cao · Danfei Xu · Xinshuo Weng · Zhuoqing Morley Mao · Anima Anandkumar · Chaowei Xiao · Marco Pavone -
2022 : AdvDO: Realistic Adversarial Attacks for Trajectory Prediction »
Yulong Cao · Chaowei Xiao · Anima Anandkumar · Danfei Xu · Marco Pavone -
2023 Poster: DiffComplete: Diffusion-based Generative 3D Shape Completion »
Ruihang Chu · Enze Xie · Shentong Mo · Zhenguo Li · Matthias Niessner · Chi-Wing Fu · Jiaya Jia -
2023 Poster: Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection »
Haibao Yu · Yingjuan Tang · Enze Xie · Jilei Mao · Ping Luo · Zaiqing Nie -
2023 Poster: DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation »
Shentong Mo · Enze Xie · Ruihang Chu · Lanqing Hong · Matthias Niessner · Zhenguo Li -
2023 Poster: T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation »
Kaiyi Huang · Kaiyue Sun · Enze Xie · Zhenguo Li · Xihui Liu -
2022 : Calibration of Large Neural Weather Models »
Andre Graubner · Kamyar Azizzadenesheli · Jaideep Pathak · Morteza Mardani · Mike Pritchard · Karthik Kashinath · Anima Anandkumar -
2022 Spotlight: Lightning Talks 6B-2 »
Alexander Korotin · Jinyuan Jia · Weijian Deng · Shi Feng · Maying Shen · Denizalp Goktas · Fang-Yi Yu · Alexander Kolesov · Sadie Zhao · Stephen Gould · Hongxu Yin · Wenjie Qu · Liang Zheng · Evgeny Burnaev · Amy Greenwald · Neil Gong · Pavlo Molchanov · Yiling Chen · Lei Mao · Jianna Liu · Jose M. Alvarez -
2022 Spotlight: Structural Pruning via Latency-Saliency Knapsack »
Maying Shen · Hongxu Yin · Pavlo Molchanov · Lei Mao · Jianna Liu · Jose M. Alvarez -
2021 Poster: Rethinking the Pruning Criteria for Convolutional Neural Network »
Zhongzhan Huang · Wenqi Shao · Xinjiang Wang · Liang Lin · Ping Luo -
2021 Poster: Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language »
Mingyu Ding · Zhenfang Chen · Tao Du · Ping Luo · Josh Tenenbaum · Chuang Gan -
2021 Poster: Controllable and Compositional Generation with Latent-Space Energy-Based Models »
Weili Nie · Arash Vahdat · Anima Anandkumar -
2021 Poster: Model-Based Reinforcement Learning via Imagination with Derived Memory »
Yao Mu · Yuzheng Zhuang · Bin Wang · Guangxiang Zhu · Wulong Liu · Jianyu Chen · Ping Luo · Shengbo Li · Chongjie Zhang · Jianye Hao -
2021 Poster: Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning »
Chongjian GE · Youwei Liang · YIBING SONG · Jianbo Jiao · Jue Wang · Ping Luo -
2021 Poster: Distilling Image Classifiers in Object Detectors »
Shuxuan Guo · Jose M. Alvarez · Mathieu Salzmann -
2021 Poster: AugMax: Adversarial Composition of Random Augmentations for Robust Training »
Haotao Wang · Chaowei Xiao · Jean Kossaifi · Zhiding Yu · Anima Anandkumar · Zhangyang Wang -
2021 Poster: Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds »
Yujia Huang · Huan Zhang · Yuanyuan Shi · J. Zico Kolter · Anima Anandkumar -
2021 Poster: Compressed Video Contrastive Learning »
Yuqi Huo · Mingyu Ding · Haoyu Lu · Nanyi Fei · Zhiwu Lu · Ji-Rong Wen · Ping Luo -
2021 Poster: Coupled Segmentation and Edge Learning via Dynamic Graph Propagation »
Zhiding Yu · Rui Huang · Wonmin Byeon · Sifei Liu · Guilin Liu · Thomas Breuel · Anima Anandkumar · Jan Kautz -
2021 Poster: Long-Short Transformer: Efficient Transformers for Language and Vision »
Chen Zhu · Wei Ping · Chaowei Xiao · Mohammad Shoeybi · Tom Goldstein · Anima Anandkumar · Bryan Catanzaro -
2021 Poster: Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions »
Jiachen Sun · Yulong Cao · Christopher B Choy · Zhiding Yu · Anima Anandkumar · Zhuoqing Morley Mao · Chaowei Xiao -
2017 Poster: Compression-aware Training of Deep Networks »
Jose Alvarez · Mathieu Salzmann -
2017 Poster: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Spotlight: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2016 Poster: Learning the Number of Neurons in Deep Networks »
Jose M. Alvarez · Mathieu Salzmann