Two-Way Identification: Evaluating MindEye2 Reconstruction Accuracy

by Image RecognitionApril 16th, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

Learn about the two-way identification method used to evaluate the accuracy of MindEye2's visual reconstructions, comparing ground truth and reconstructed images using deep learning embeddings.
featured image - Two-Way Identification: Evaluating MindEye2 Reconstruction Accuracy
Image Recognition HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References


A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

A.9 Reconstruction Evaluations: Additional Information

Two-way comparisons were performed for AlexNet (Krizhevsky et al., 2012) (second and fifth layers), InceptionV3 (Szegedy et al., 2016) (last pooling layer), and CLIP (final layer of ViT-L/14). We followed the same image preprocessing and the same two-way identification steps as Ozcelik and VanRullen (2023) and Scotti et al. (2023). For two-way identification, for each model, we computed the Pearson correlation between embeddings for the ground truth image and the reconstructed image, as well as the correlation between the ground truth image and a different reconstruction elsewhere in the test set. If the correlation for the former was higher than the latter, this was marked as correct. For each test sample, performance was averaged across all possible pairwise comparisons using the other 999 reconstructions to ensure no bias from random sample selection. This yielded 1,000 averaged percent correct outputs, which we averaged across to obtain the metrics reported in Table 1.


This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks