Recently, the pure camera-based Bird’s-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving. However, most existing BEV solutions either suffer from modest performance or require considerable resources to execute on-vehicle inference. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing real-time BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive view transformation or depth representation. Starting rom M2BEV baseline, we further introduce (1) a strong data augmentation strategy for both image and BEV space to avoid over-fitting (2) a multi-frame feature fusion mechanism to leverage the temporal information (3) an optimized deployment friendly view transformation to speed up the inference. Through experiments, we show Fast-BEV model family achieves considerable accuracy and efficiency on edge. In particular, our M1 model (R18@256×704) can run over 50FPS on the Tesla T4 platform, with 46.9% NDS on the nuScenes validation set. Our largest model (R101@900x1600) establishes a new state-of-the-art 53.5% NDS on the nuScenes validation set. Code will be made publicly available.
Bin Huang (SenseTime)
Yangguang Li (SenseTime)
Feng Liang (The University of Texas at Austin)
I am a PhD student at UT Austin, fortunately working with Prof. Diana Marculescu. My current research interests lie in efficient machine learning, multimodal learning as well as their applications. If you find any research interests that we might share, feel free to drop me an email. I am always open to potential collaborations.
Enze Xie (The University of Hong Kong)
I am currently a senior researcher at AI Theory Lab of Huawei Noah's Ark Lab (Hong Kong). I obtained my Ph.D. degree from Department of Computer Science, The University of Hong Kong in 2022. My advisor is Prof. Ping Luo and my co-advisor is Prof. Wenping Wang. My research interest is solving challenging problems in computer vision and machine learning. I did some works on instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including: PolarMask (Rank 10 in CVPR 2020 Top-10 Influential Papers). PVT (Rank 2 in ICCV 2021 Top-10 Influential Papers). SegFormer (Rank 3 in NeurIPS 2021 Top-10 Influential Papers).
Luya Wang (Beijing University of Posts and Telecommunications)
Mingzhu Shen (Sensetime Research)
Fenggang Liu (SenseTime)
Tianqi Wang (The University of Hong Kong)
Ping Luo (The University of Hong Kong)
Jing Shao (Sensetime)
More from the Same Authors
2021 : MQBench: Towards Reproducible and Deployable Model Quantization Benchmark »
Yuhang Li · Mingzhu Shen · Jian Ma · Yan Ren · Mingxin Zhao · Qi Zhang · Ruihao Gong · Fengwei Yu · Junjie Yan
2022 Poster: A Mixture Of Surprises for Unsupervised Reinforcement Learning »
Andrew Zhao · Matthieu Lin · Yangguang Li · Yong-jin Liu · Gao Huang