Visualizing Brain Function: MindEye2 Reconstructions from ROI-Specific fMRI

by Image RecognitionApril 17th, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

Explore MindEye2's ROI analysis, revealing the preferential stimuli associated with brain regions like V1, Face-ROI, Word-ROI, Place-ROI, and Body-ROI through reconstructed images.
featured image - Visualizing Brain Function: MindEye2 Reconstructions from ROI-Specific fMRI
Image Recognition HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References


A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

A.12 ROI-Optimized Stimuli

Here we try to visualize the functional organization of the brain by feeding synthetic brain activity through pretrained MindEye2. Inspired by the ROI-optimal analyses of Ozcelik and VanRullen (2023), we utilized four ROIs derived from population receptive field (pRF) experiments and four ROIs derived from functional localization (fLoc) experiments. These pRF and fLoc experiments were provided by the NSD dataset. The ROIs are as follows (region names following the terminology adopted in Allen et al. (2021)): V1 is the concatenation of V1 ventral (V1v) and V1 dorsal (V1d), and similarly for V2 and V3; V4 is the human V4 (hV4); the Face-ROI consists of the union of OFA, FFA1, FFA-2, mTL-faces, and aTL-faces; the Word-ROI consists of OWFA, VWFA-1, VWFA-2, mfs-words, and mTL-words; the Place-ROI consists of OPA, PPA, and RSC; and the Body-ROI consists of EBA, FBA-1, FBA-2, and mTLbodies.


Figure 10: UMAP plots depict CLIP image latents (blue), backbone latents (green), retrieval submodule latents (orange), and diffusion prior latents (red). UMAPs were estimated across the 1,000 test samples for subject 1, using the full 40-session model. CLIP image latents correspond to the 256 × 1664 dimensionality of OpenCLIP ViT-bigG/14 image token embeddings. Euclidean distance between the given MindEye2 embedding space and CLIP image space is lowest for the diffusion prior, suggesting that the diffusion prior helps to align the two embedding spaces.


To observe the functional specialization associated with each of the ROIs, we used MindEye2 to reconstruct images based on synthetic fMRI patterns where flattened voxels were either set to 0 if outside the ROI or 1 if inside the ROI. Results are shown in Figure 11.


Subjectively interpreting these reconstructions, it seems that Face-ROI reconstructions depicted human faces, aligned with our expectations for the functional specialization of this region. Word-ROI reconstructions depicted distorted characters written on signs (with the exception of subject 7). Place-ROI reconstructions depicted enclosed environments, mostly rooms. Body-ROI reconstructions depicted strange mixtures of human body parts and animals. V1 reconstructions were dark with a few points of high contrast. V2 reconstructions showed somewhat softer colors. V3 and V4 reconstructions were more abstract with amorphous shapes and more vivid colors.


Such results demonstrate the potential to directly visualize preferential stimuli for any desired region of interest; further functional specialization exploration could be performed using more sophisticated methods (c.f., Sarch et al. (2023); Luo et al. (2023a;b)).


This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks