Can Multimodal AI Replace Radiologists?

by Text MiningApril 15th, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

GPT-4V stands out in biomedical image analysis, rivaling professionals in tasks like visual question answering and chart interpretation. However, its weaknesses—such as spatial misinterpretations, color insensitivity, and confirmation bias—pose significant risks in clinical or research contexts. Emerging methods like Visual Referring Prompting (VRP) offer a way forward, but more rigorous evaluation is needed to ensure multimodal models like GPT-4V can truly understand biomedical visuals.

Company Mentioned

Mention Thumbnail
featured image - Can Multimodal AI Replace Radiologists?
Text Mining HackerNoon profile picture
0-item

Authors:

(1) Jinge Wang, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(2) Zien Cheng, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(3) Qiuming Yao, School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, USA;

(4) Li Liu, College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA and Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;

(5) Dong Xu, Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA;

(6) Gangqing Hu, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA ([email protected]).

Abstract and 1. Introduction

2. Omics

3. Genetics

4. Biomedical Text Mining and 4.1. Performance Assessments across typical tasks

4.2. Biological pathway mining

5. Drug Discovery

5.1. Human-in-the-Loop and 5.2. In-context Learning

5.2 Instruction Finetuning

6. Biomedical Image Understanding

7. Bioinformatics Programming

7.1 Application in Applied Bioinformatics

7.2. Biomedical Database Access

7.2. Online tools for Coding with ChatGPT

7.4 Benchmarks for Bioinformatics Coding

8. Chatbots in Bioinformatics Education

9. Discussion and Future Perspectives

Author Contributions, Acknowledgements, Conflict of Interest Statement, Ethics Statement, and References

6. BIOMEDICAL IMAGE UNDERSTANDING

In recent advancements, multimodal AI models have garnered significant attention in biomedical research[76]. Released in late September 2023, GPT-4V(ision) has been the subject of numerous studies that explored its application in image-related tasks across various biomedical topics[77-83]. For biomedical images, GPT-4V exhibits a performance rivaling professionals in Medical Visual Question Answering[81, 82] and exceeds traditional image models in biomedical image classification[84]. For scientific figures, GPT-4V can proficiently explain various plot types and apply domain knowledge to enrich interpretations[85].


Despite the impressive performance, current evaluations reveal significant limitations. OpenAI acknowledges the limitation of GPT-4V in differentiating closely located text and making factual errors in an authoritative tone[86]. The model is not competent in perceiving visual patterns' colors, quantities, and spatial relationships in scientific figures[85]. Image interpretation with domain knowledge from GPT-4V may risk “confirmation bias"[87]: either the observation or conclusion is incorrect, but the supporting knowledge is valid[85], or the observation or conclusion is correct, but the supporting knowledge is invalid/irrelevant[88]. Such biases are particularly concerning as users without requisite expertise might be easily misled by these plausible responses.


Prompt engineering has been instrumental in enhancing AI responses to text inputs. The emergence of GPT4V emphasizes the need to develop equivalent methodologies for visual inputs to refine chatbots' comprehension across modalities. The field of computer vision has already witnessed some progress in this direction[89]. Yang, Li [90] proposes visual referring prompting (VRP) by setting visual pointer references through directly editing input images to augment textual prompts with visual cues. VRP has proven effective in preliminary case studies, leading to the creation of a benchmark like VRPTEST[91] to evaluate its efficacy. Yet, a thorough, quantitative assessment of VRP's impact on GPT-4V's understanding of biomedical images remains to be explored.



This paper is available on arxiv under CC BY 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks