Skip to yearly menu bar Skip to main content


Poster

Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

Haoye Dong · Aviral Chharia · Wenbo Gou · Francisco Vicente Carrasco · Fernando D De la Torre


Abstract:

Reconstructing hands in 3D from a single RGB image is challenging due to the nimbleness of hands, varied poses, truncation, and occlusion during object interaction. Existing methods employ attention-based transformers to learn 3D hand pose and shape, but these methods fail to capture the semantic relations between different joints. Additionally, relying solely on the attention mechanism for 3D hand mesh reconstruction does not fully exploit the joint spatial sequences. To address these issues, we propose a novel graph-guided Mamba framework, named Hamba, which bridges graph learning and state space modeling. Our core idea is to reformulate Mamba's scanning into graph-guided bidirectional scanning for 3D reconstruction using a few effective tokens. This enables us to learn the joint relations and spatial sequences to enhance 3D hand reconstruction performance. Specifically, we design a Graph-guided State Space (GSS) block that learns the graph-structured relations and spatial sequences of joints. The GSS block improves semantic relation learning while using 88.5% fewer tokens than attention-based methods and can serve as a plug-and-play module for other tasks. Furthermore, we integrate the global spatial tokens with local graph-structured features through a fusion module. By utilizing the GSS and the fusion module, Hamba effectively leverages graph-guided state space modeling features and considers both global and local features to enhance performance jointly. Extensive experiments on multiple benchmarks and in-the-wild tests demonstrate that Hamba significantly outperforms state-of-the-art methods, achieving PA-MPVPE of 5.3mm and F@15mm of 0.992 on the FreiHAND benchmark. Hamba also achieves Rank 1 in two challenging competitions on 3D hand reconstruction. The code has been submitted as supplementary and will be open-sourced after manuscript acceptance.

Live content is unavailable. Log in and register to view live content