Timezone: »
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. However, it still remains an open question on how to exploit masked autoencoding for learning 3D representations of irregular point clouds. In this paper, we propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds. Unlike the standard transformer in MAE, we modify the encoder and decoder into pyramid architectures to progressively model spatial geometries and capture both fine-grained and high-level semantics of 3D shapes. For the encoder that downsamples point tokens by stages, we design a multi-scale masking strategy to generate consistent visible regions across scales, and adopt a local spatial self-attention mechanism during fine-tuning to focus on neighboring patterns. By multi-scale token propagation, the lightweight decoder gradually upsamples point tokens with complementary skip connections from the encoder, which further promotes the reconstruction from a global-to-local perspective. Extensive experiments demonstrate the state-of-the-art performance of Point-M2AE for 3D representation learning. With a frozen encoder after pre-training, Point-M2AE achieves 92.9% accuracy for linear SVM on ModelNet40, even surpassing some fully trained methods. By fine-tuning on downstream tasks, Point-M2AE achieves 86.43% accuracy on ScanObjectNN, +3.36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme. Code is available at https://github.com/ZrrSkywalker/Point-M2AE.
Author Information
Renrui Zhang (The Chinese University of Hong Kong)
Ziyu Guo (Peking University)
Peng Gao (Shanghai AI Lab)
Rongyao Fang (The Chinese University of Hong Kong)
Bin Zhao (Northwestern Polytechnical University)
Dong Wang (Shanghai AI Laboratory)
Yu Qiao (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)
Hongsheng Li (The Chinese University of Hong Kong)
More from the Same Authors
-
2022 Spotlight: Lightning Talks 4B-3 »
Zicheng Zhang · Mancheng Meng · Antoine Guedon · Yue Wu · Wei Mao · Zaiyu Huang · Peihao Chen · Shizhe Chen · Yongwei Chen · Keqiang Sun · Yi Zhu · chen rui · Hanhui Li · Dongyu Ji · Ziyan Wu · miaomiao Liu · Pascal Monasse · Yu Deng · Shangzhe Wu · Pierre-Louis Guhur · Jiaolong Yang · Kunyang Lin · Makarand Tapaswi · Zhaoyang Huang · Terrence Chen · Jiabao Lei · Jianzhuang Liu · Vincent Lepetit · Zhenyu Xie · Richard I Hartley · Dinggang Shen · Xiaodan Liang · Runhao Zeng · Cordelia Schmid · Michael Kampffmeyer · Mathieu Salzmann · Ning Zhang · Fangyun Wei · Yabin Zhang · Fan Yang · Qifeng Chen · Wei Ke · Quan Wang · Thomas Li · qingling Cai · Kui Jia · Ivan Laptev · Mingkui Tan · Xin Tong · Hongsheng Li · Xiaodan Liang · Chuang Gan -
2022 Spotlight: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Spotlight: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2022 Spotlight: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Spotlight: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs »
Jinguo Zhu · Xizhou Zhu · Wenhai Wang · Xiaohua Wang · Hongsheng Li · Xiaogang Wang · Jifeng Dai -
2022 Poster: MCMAE: Masked Convolution Meets Masked Autoencoders »
Peng Gao · Teli Ma · Hongsheng Li · Ziyi Lin · Jifeng Dai · Yu Qiao -
2022 Poster: Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline »
Penghao Wu · Xiaosong Jia · Li Chen · Junchi Yan · Hongyang Li · Yu Qiao -
2022 Poster: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Poster: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields »
Keqiang Sun · Shangzhe Wu · Zhaoyang Huang · Ning Zhang · Quan Wang · Hongsheng Li -
2022 Poster: Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer »
Yanjing Li · Sheng Xu · Baochang Zhang · Xianbin Cao · Peng Gao · Guodong Guo -
2021 Poster: DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks »
Wei Sun · Aojun Zhou · Sander Stuijk · Rob Wijnhoven · Andrew Nelson · Hongsheng Li · Henk Corporaal -
2021 Poster: Container: Context Aggregation Networks »
peng gao · Jiasen Lu · Hongsheng Li · Roozbeh Mottaghi · Aniruddha Kembhavi -
2021 Poster: Dual-stream Network for Visual Recognition »
Mingyuan Mao · peng gao · Renrui Zhang · Honghui Zheng · Teli Ma · Yan Peng · Errui Ding · Baochang Zhang · Shumin Han -
2020 Poster: Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID »
Yixiao Ge · Feng Zhu · Dapeng Chen · Rui Zhao · Hongsheng Li -
2020 Poster: Balanced Meta-Softmax for Long-Tailed Visual Recognition »
Jiawei Ren · Cunjun Yu · shunan sheng · Xiao Ma · Haiyu Zhao · Shuai Yi · Hongsheng Li -
2019 Poster: Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis »
Xihui Liu · Guojun Yin · Jing Shao · Xiaogang Wang · Hongsheng Li -
2018 Poster: FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification »
Yixiao Ge · Zhuowan Li · Haiyu Zhao · Guojun Yin · Shuai Yi · Xiaogang Wang · Hongsheng Li -
2016 Poster: CRF-CNN: Modeling Structured Information in Human Pose Estimation »
Xiao Chu · Wanli Ouyang · Hongsheng Li · Xiaogang Wang