Skip to yearly menu bar Skip to main content


Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Jingyuan Sun · Mingxiao Li · Zijiao Chen · Yunhao Zhang · Shaonan Wang · Marie-Francine Moens

Great Hall & Hall B1+B2 (level 1) #431
[ ] [ Project Page ]
Wed 13 Dec 3 p.m. PST — 5 p.m. PST


Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. The code implementations is available at

Chat is not available.