Table of Links
2 MindEye2 and 2.1 Shared-Subject Functional Alignment
2.2 Backbone, Diffusion Prior, & Submodules
2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP
3 Results and 3.1 fMRI-to-Image Reconstruction
3.3 Image/Brain Retrieval and 3.4 Brain Correlation
6 Acknowledgements and References
A Appendix
A.2 Additional Dataset Information
A.3 MindEye2 (not pretrained) vs. MindEye1
A.4 Reconstruction Evaluations Across Varying Amounts of Training Data
A.5 Single-Subject Evaluations
A.7 OpenCLIP BigG to CLIP L Conversion
A.9 Reconstruction Evaluations: Additional Information
A.10 Pretraining with Less Subjects
A.11 UMAP Dimensionality Reduction
A.13 Human Preference Experiments
5 Conclusion
We introduce MindEye2, a modeling approach that outputs reconstructions of seen images from fMRI activity with a similar quality to previous approaches using only a fraction of the training data. MindEye2 further achieves SOTA across reconstruction and retrieval metrics when supplied with the full training data. Our approach pretrains a model using data from multiple subjects, which is then fine-tuned on scarce data from a held-out subject. Patterns of fMRI activity are mapped to CLIP space and images are reconstructed with the help of our unCLIP model fine-tuned from Stable Diffusion XL. Our work shows the potential to apply deep learning models trained on large-scale neuroimaging datasets to new subjects with minimal data.
5.1 Limitations
fMRI is extremely sensitive to movement and requires subjects to comply with the task: decoding is easily resisted by slightly moving one’s head or thinking about unrelated information (Tang et al., 2023). MindEye2 has also only been shown to work on natural scenes such as those in COCO; additional data and/or specialized generative models would likely be required for other image distributions.
5.2 Broader Impacts
The present work demonstrates that it is now practical for patients to undergo a single MRI scanning session and produce enough data to perform high-quality reconstructions of their visual perception. Such image reconstructions from brain activity are expected to be systematically distorted due to factors including mental state, neurological conditions, etc. This could potentially enable novel clinical diagnosis and assessment approaches, including applications for improved locked-in (pseudocoma) patient communication (Monti et al., 2010) and brain-computer interfaces if adapted to real-time analysis (Wallace et al., 2022) or non-fMRI neuroimaging modalities. As technology continues to improve, we note it is important that brain data be carefully protected and companies collecting such data be transparent with their use.
This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);
(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;
(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;
(4) Reese Kneeland, University of Minnesota and a Core contribution;
(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);
(6) Ashutosh Narang, Medical AI Research Center (MedARC);
(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);
(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);
(9) Thomas Naselaris, University of Minnesota;
(10) Kenneth A. Norman, Princeton Neuroscience Institute;
(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).