Seeing With Less: MindEye2's Efficient Approach to Brain-Based Image Decoding

by Image RecognitionApril 12th, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

We introduce MindEye2, a modeling approach that outputs reconstructions of seen images from fMRI activity with a similar quality to previous approaches using only a fraction of the training data.
featured image - Seeing With Less: MindEye2's Efficient Approach to Brain-Based Image Decoding
Image Recognition HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References


A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

5 Conclusion

We introduce MindEye2, a modeling approach that outputs reconstructions of seen images from fMRI activity with a similar quality to previous approaches using only a fraction of the training data. MindEye2 further achieves SOTA across reconstruction and retrieval metrics when supplied with the full training data. Our approach pretrains a model using data from multiple subjects, which is then fine-tuned on scarce data from a held-out subject. Patterns of fMRI activity are mapped to CLIP space and images are reconstructed with the help of our unCLIP model fine-tuned from Stable Diffusion XL. Our work shows the potential to apply deep learning models trained on large-scale neuroimaging datasets to new subjects with minimal data.

5.1 Limitations

fMRI is extremely sensitive to movement and requires subjects to comply with the task: decoding is easily resisted by slightly moving one’s head or thinking about unrelated information (Tang et al., 2023). MindEye2 has also only been shown to work on natural scenes such as those in COCO; additional data and/or specialized generative models would likely be required for other image distributions.

5.2 Broader Impacts

The present work demonstrates that it is now practical for patients to undergo a single MRI scanning session and produce enough data to perform high-quality reconstructions of their visual perception. Such image reconstructions from brain activity are expected to be systematically distorted due to factors including mental state, neurological conditions, etc. This could potentially enable novel clinical diagnosis and assessment approaches, including applications for improved locked-in (pseudocoma) patient communication (Monti et al., 2010) and brain-computer interfaces if adapted to real-time analysis (Wallace et al., 2022) or non-fMRI neuroimaging modalities. As technology continues to improve, we note it is important that brain data be carefully protected and companies collecting such data be transparent with their use.


This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks