Timezone: »

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Jian Wang · Chenhui Gou · Qiman Wu · Haocheng Feng · Junyu Han · Errui Ding · Jingdong Wang

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #635

Recently, transformer-based networks have shown impressive results in semanticsegmentation. Yet for real-time semantic segmentation, pure CNN-based ap-proaches still dominate in this field, due to the time-consuming computationmechanism of transformer. We propose RTFormer, an efficient dual-resolutiontransformer for real-time semantic segmenation, which achieves better trade-offbetween performance and efficiency than CNN-based models. To achieve highinference efficiency on GPU-like devices, our RTFormer leverages GPU-FriendlyAttention with linear complexity and discards the multi-head mechanism. Besides,we find that cross-resolution attention is more efficient to gather global context in-formation for high-resolution branch by spreading the high level knowledge learnedfrom low-resolution branch. Extensive experiments on mainstream benchmarksdemonstrate the effectiveness of our proposed RTFormer, it achieves state-of-the-arton Cityscapes, CamVid and COCOStuff, and shows promising results on ADE20K.

Author Information

Jian Wang (Baidu)
Chenhui Gou (Australian National University)
Qiman Wu (National Pedagogical University M. Dragomanov)
Haocheng Feng (Baidu)
Junyu Han (Baidu)
Errui Ding (Baidu Inc.)
Jingdong Wang (Microsoft)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors