Table of Links
2 MindEye2 and 2.1 Shared-Subject Functional Alignment
2.2 Backbone, Diffusion Prior, & Submodules
2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP
3 Results and 3.1 fMRI-to-Image Reconstruction
3.3 Image/Brain Retrieval and 3.4 Brain Correlation
6 Acknowledgements and References
A Appendix
A.2 Additional Dataset Information
A.3 MindEye2 (not pretrained) vs. MindEye1
A.4 Reconstruction Evaluations Across Varying Amounts of Training Data
A.5 Single-Subject Evaluations
A.7 OpenCLIP BigG to CLIP L Conversion
A.9 Reconstruction Evaluations: Additional Information
A.10 Pretraining with Less Subjects
A.11 UMAP Dimensionality Reduction
A.13 Human Preference Experiments
A.12 ROI-Optimized Stimuli
Here we try to visualize the functional organization of the brain by feeding synthetic brain activity through pretrained MindEye2. Inspired by the ROI-optimal analyses of Ozcelik and VanRullen (2023), we utilized four ROIs derived from population receptive field (pRF) experiments and four ROIs derived from functional localization (fLoc) experiments. These pRF and fLoc experiments were provided by the NSD dataset. The ROIs are as follows (region names following the terminology adopted in Allen et al. (2021)): V1 is the concatenation of V1 ventral (V1v) and V1 dorsal (V1d), and similarly for V2 and V3; V4 is the human V4 (hV4); the Face-ROI consists of the union of OFA, FFA1, FFA-2, mTL-faces, and aTL-faces; the Word-ROI consists of OWFA, VWFA-1, VWFA-2, mfs-words, and mTL-words; the Place-ROI consists of OPA, PPA, and RSC; and the Body-ROI consists of EBA, FBA-1, FBA-2, and mTLbodies.
To observe the functional specialization associated with each of the ROIs, we used MindEye2 to reconstruct images based on synthetic fMRI patterns where flattened voxels were either set to 0 if outside the ROI or 1 if inside the ROI. Results are shown in Figure 11.
Subjectively interpreting these reconstructions, it seems that Face-ROI reconstructions depicted human faces, aligned with our expectations for the functional specialization of this region. Word-ROI reconstructions depicted distorted characters written on signs (with the exception of subject 7). Place-ROI reconstructions depicted enclosed environments, mostly rooms. Body-ROI reconstructions depicted strange mixtures of human body parts and animals. V1 reconstructions were dark with a few points of high contrast. V2 reconstructions showed somewhat softer colors. V3 and V4 reconstructions were more abstract with amorphous shapes and more vivid colors.
Such results demonstrate the potential to directly visualize preferential stimuli for any desired region of interest; further functional specialization exploration could be performed using more sophisticated methods (c.f., Sarch et al. (2023); Luo et al. (2023a;b)).
This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);
(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;
(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;
(4) Reese Kneeland, University of Minnesota and a Core contribution;
(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);
(6) Ashutosh Narang, Medical AI Research Center (MedARC);
(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);
(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);
(9) Thomas Naselaris, University of Minnesota;
(10) Kenneth A. Norman, Princeton Neuroscience Institute;
(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).