Timezone: »
Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images. In this paper, we present a generic Dual-stream Network (DS-Net) to fully explore the representation capacity of local and global pattern features for image classification. Our DS-Net can simultaneously calculate fine-grained and integrated features and efficiently fuse them. Specifically, we propose an Intra-scale Propagation module to process two different resolutions in each block and an Inter-Scale Alignment module to perform information interaction across features at dual scales. Besides, we also design a Dual-stream FPN (DS-FPN) to further enhance contextual information for downstream dense predictions. Without bells and whistles, the proposed DS-Net outperforms DeiT-Small by 2.4\% in terms of top-1 accuracy on ImageNet-1k and achieves state-of-the-art performance over other Vision Transformers and ResNets. For object detection and instance segmentation, DS-Net-Small respectively outperforms ResNet-50 by 6.4\% and 5.5 \% in terms of mAP on MSCOCO 2017, and surpasses the previous state-of-the-art scheme, which significantly demonstrates its potential to be a general backbone in vision tasks. The code will be released soon.
Author Information
Mingyuan Mao (Beihang University)
peng gao (allen institute for artificial intelligence ai2)
Renrui Zhang (Peking University)
Honghui Zheng (Beijing University of Post and Telecommunication)
Teli Ma
Yan Peng (National University of Defense Technology)
Errui Ding (Baidu Inc.)
Baochang Zhang (Beihang University)
Shumin Han (Baidu)
More from the Same Authors
-
2022 Poster: FNeVR: Neural Volume Rendering for Face Animation »
Bohan Zeng · Boyu Liu · Hong Li · Xuhui Liu · Jianzhuang Liu · Dapeng Chen · Wei Peng · Baochang Zhang -
2022 Poster: Delving into Sequential Patches for Deepfake Detection »
Jiazhi Guan · Hang Zhou · Zhibin Hong · Errui Ding · Jingdong Wang · Chengbin Quan · Youjian Zhao -
2022 Spotlight: Delving into Sequential Patches for Deepfake Detection »
Jiazhi Guan · Hang Zhou · Zhibin Hong · Errui Ding · Jingdong Wang · Chengbin Quan · Youjian Zhao -
2022 Spotlight: RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer »
Jian Wang · Chenhui Gou · Qiman Wu · Haocheng Feng · Junyu Han · Errui Ding · Jingdong Wang -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training »
Renrui Zhang · Ziyu Guo · Peng Gao · Rongyao Fang · Bin Zhao · Dong Wang · Yu Qiao · Hongsheng Li -
2022 Poster: RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer »
Jian Wang · Chenhui Gou · Qiman Wu · Haocheng Feng · Junyu Han · Errui Ding · Jingdong Wang -
2022 Poster: Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning »
Yanpeng Sun · Qiang Chen · Xiangyu He · Jian Wang · Haocheng Feng · Junyu Han · Errui Ding · Jian Cheng · Zechao Li · Jingdong Wang -
2022 Poster: Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer »
Yanjing Li · Sheng Xu · Baochang Zhang · Xianbin Cao · Peng Gao · Guodong Guo -
2021 Poster: Container: Context Aggregation Networks »
peng gao · Jiasen Lu · Hongsheng Li · Roozbeh Mottaghi · Aniruddha Kembhavi -
2020 Poster: Rotated Binary Neural Network »
Mingbao Lin · Rongrong Ji · Zihan Xu · Baochang Zhang · Yan Wang · Yongjian Wu · Feiyue Huang · Chia-Wen Lin -
2020 Poster: Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching »
Di Hu · Rui Qian · Minyue Jiang · Xiao Tan · Shilei Wen · Errui Ding · Weiyao Lin · Dejing Dou -
2019 Poster: Variational Structured Semantic Inference for Diverse Image Captioning »
Fuhai Chen · Rongrong Ji · Jiayi Ji · Xiaoshuai Sun · Baochang Zhang · Xuri Ge · Yongjian Wu · Feiyue Huang · Yan Wang -
2018 Poster: Compact Generalized Non-local Network »
Kaiyu Yue · Ming Sun · Yuchen Yuan · Feng Zhou · Errui Ding · Fuxin Xu