Skip to yearly menu bar Skip to main content


Spotlight Poster

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders

Xiangdong Zhang · Shaofeng Zhang · Junchi Yan

East Exhibit Hall A-C #2002
[ ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Masked autoencoder has been widely explored in point cloud self-supervised learning, whereby the point cloud is generally divided into visible and masked parts. These methods typically include an encoder accepting visible patches (normalized) and corresponding patch centers (position) as input, with the decoder accepting the output of the encoder and the centers (position) of the masked part to reconstruct each point in the masked patches. Then, the pre-trained encoders are used for downstream tasks. In this paper, we show a motivating empirical result that when directly feeding the centers of the masked patches to the decoder without the information of encoders, it still reconstructs well. In other words, the centers of patches are important and the reconstruction objective actually does not necessarily rely on the encoder representations, thus preventing the encoder from learning semantic representations. Based on this key observation, we propose a simple yet effective method, $i.e.$, learning to \textbf{P}redict \textbf{C}enters for \textbf{P}oint \textbf{M}asked \textbf{A}uto\textbf{E}ncoders (\textbf{PCP-MAE}) which guide the model to learn to predict the significant centers and use the predicted centers to replace the directly provided centers. Specifically, we propose a Predicting Center Module (PCM) that shares parameters with the original encoder with extra cross-attention to predict centers. Our method is of high pre-training efficiency compared to other alternatives and achieves great improvement over Point-MAE, particularly outperforming it by \textbf{5.50\%, 6.03\%, and 5.17\%} in three variants of ScanObjectNN. The code will be made publicly available.

Live content is unavailable. Log in and register to view live content