Can ChatGPT Accelerate Drug Discovery? Here's What the Science Says

Authors:

(1) Jinge Wang, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(2) Zien Cheng, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(3) Qiuming Yao, School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, USA;

(4) Li Liu, College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA and Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;

(5) Dong Xu, Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA;

(6) Gangqing Hu, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA ([email protected]).

Table of Links

Abstract and 1. Introduction

2. Omics

3. Genetics

4. Biomedical Text Mining and 4.1. Performance Assessments across typical tasks

4.2. Biological pathway mining

5. Drug Discovery

5.1. Human-in-the-Loop and 5.2. In-context Learning

5.2 Instruction Finetuning

6. Biomedical Image Understanding

7. Bioinformatics Programming

7.1 Application in Applied Bioinformatics

7.2. Biomedical Database Access

7.2. Online tools for Coding with ChatGPT

7.4 Benchmarks for Bioinformatics Coding

8. Chatbots in Bioinformatics Education

9. Discussion and Future Perspectives

Author Contributions, Acknowledgements, Conflict of Interest Statement, Ethics Statement, and References

5. DRUG DISCOVERY

Drug discovery is a complex and failure-prone process that demands significant time, effort, and financial investment. The emerging interest in ChatGPT's potential to facilitate drug discovery has captivated the pharmaceutical community[47-50]. Recent studies have showcased the chatbot's proficiency in addressing tasks related to drug discovery. GPT-3.5, for example, has been noted for its respectable accuracy in identifying associations between drugs and diseases[51]. Furthermore, GPT models exhibit strong performance in tasks related to textual chemistry, such as generating molecular captions, but face challenges in tasks that require accurate interpretation of the Simplified Molecular-Input Line-Entry System (SMILES) strings[52]. Research by Juhi, Pipil [53] highlighted ChatGPT's partial success in predicting and elucidating drug-drug interactions (DDIs). When benchmarked against two clinical tools, GPT models achieved an accuracy rate of 50-60% in DDI prediction, showing a 20-30% improvement with further optimization[54]. When evaluated using the DDI corpus[55], ChatGPT achieved an F1 score of 52%[26]. In more rigorous assessments, ChatGPT was unable to pass various pharmacist licensing examinations[56-58]. It also shows limitations in patient education and in recognizing adverse drug reactions[59]. These findings suggest that, although ChatGPT offers valuable support in drug discovery, its capacity to tackle complex challenges is ineffective and necessitates close human oversight.

In the following few sections, we will review three important aspects of LLM use in drug discovery (Figure 3). review examples and tools that facilitate a human-in-the-loop approach for reliable use of ChatGPT in drug discovery. Then we highlight the advances brought by strategic prompting using in-context learning with examples to increase response accuracy. Lastly, we summarize the progress of using task- and or instruction finetune to adapt a foundational model to specific tasks.

5.1. HUMAN-IN-THE-LOOP

The application of AI in drug development necessitates substantial expertise from human specialists for result refinement. This collaborative approach is illustrated in a case study focusing on the development of anti-cocaine addiction drugs aided by ChatGPT[60]. Throughout this process, GPT-4 assumes three critical roles in sparking new ideas, clarifying methodologies, and providing coding assistance. To enhance its performance, the chatbot is equipped with various plugins at each phase to ensure deeper understanding of context, access to the latest information, improved coding capabilities, and more precise prompt generation. The responses generated by the chatbot are critically evaluated with existing literature and expert domain knowledge. Feedback derived from this evaluation is then provided to the chatbot for further improvement. This iterative, human-in-the-loop methodology led to the identification of 15 promising multi-target leads for anti-cocaine addiction[60]. This example underscores the synergistic potential of human expertise and AI in advancing drug discovery efforts.

Several tools leveraging LLMs offer interactive interfaces to enhance molecule description and optimization. DrugChat[61] is a system akin to ChatGPT, offering interactive question-and-answer and textual explanations starting from drug graph representations. ChatDrug[62], on the other hand, is a framework built to streamline the process of drug editing through GPT APIs. It features a prompt design module equipped with a collection of template prompts. Furthermore, it incorporates a retrieval and domain feedback module, which pulls examples from external databases to ensure that the response is grounded in real-world examples. Additionally, ChatDrug includes a conversational module dedicated to interactive refinement. This module allows for the integration of feedback from domain experts through iteration, ensuring that each suggestion is safeguarded through expert scrutiny. DrugAssist[63] adapts a similar approach for molecule optimization by utilizing external database retrieval for hints and allowing iterative refinement with expert feedback. The process of iterative refinement, supported by example retrieval from external databases as contextual hints, enhances the model's accuracy and relevance to practical applications.

5.2. IN-CONTEXT LEARNING

In-context learning (ICL) enhances chatbots' responses by leveraging examples from a domain knowledgebase through prompting without finetuning a foundation model[64]. This approach utilizes examples closely aligned with the subject matter to ground the responses of LLMs with relevant domain knowledge[62, 63, 65]. Evaluating LLMs' capabilities across various chemistry-related tasks has shown that including contextually similar examples results in superior outcomes compared to approaches that use no example or employ random sampling; The performance of these models improves progressively with the inclusion of additional examples[52, 65, 66]. ICL also boosts the accuracy in more complex regression tasks, rendering GPT-4 competitively effective compared to dedicated machine learning models[67, 68]. Lastly, instead of specific examples, enriching the context with related information—such as disease backgrounds and synonyms in a fact check task on drug-disease associations[51] —also augments response accuracy. These examples, with in-context learning and context enrichment, underscore the critical role of domain-knowledge in improving the quality and reliability of LLM responses in drug discovery tasks.

5.3. INSTRUCTION FINETUNING

Task-tuning language models for specific tasks within drug discovery has shown considerable promise, as evidenced by several recent projects. ChatMol[69] is a chatbot based on the T5 model[70], finetuned with experimental property data and molecular spatial knowledge to improve its capabilities in describing and editing target molecules. Task-tuning GPT-3 has demonstrated notable advantages over traditional machine learning approaches, particularly in tasks where training data is small[66]. Task-tuning also significantly improves GPT-3 in extracting DDI triplets, showcasing a substantial F1 score enhancement over GPT-4 with few-shots[71]. These projects demonstrate that task-tuning of foundation models can effectively capture the complex knowledge at the molecule level relevant to drug discovery.

Instruction tuning diverges from task tuning by training an LLM across a spectrum of tasks using instruction-output pairs and enables the model to address new, unseen tasks[72]. DrugAssist[63], a Llama2-7B-based model, though instruction-tuned with data with individual molecule properties, achieved competitive results when simultaneously optimizing multiple properties. Similarly, DrugChat[61], a Vicuna-13b-based model instruction-tuned with examples from databases like ChEMBL and PubChem, effectively answered open-ended questions about graph-represented drug compounds. MolInstructions[73], a large-scale instruction dataset tailored for the biomolecular domain, demonstrated its effectiveness in finetuning models like Llama-7B on a variety of tasks, including molecular property prediction and biomedical text mining.

Task-tuning may be combined with instruction tuning to synergize the strength of each. ChemDFM[74], pre-trained on LLaMa-13B with a chemically rich corpus and further enhanced through instruction tuning, exceled in a range of chemical tasks, particularly in molecular property prediction and reaction prediction, outperforming models like GPT-4 with in-context learning. InstructMol[75] is a multi-modality instructiontuning-based LLM. It has a two-stage tuning process, first by instruction tuning with molecule graph-text caption pairs to integrate molecule knowledge and then by task-specific tuning for three drug discoveryrelated molecular tasks. Applied to Vicuna-7B, InstructMol surpassed other leading open-source LLMs and narrows the performance gap with specialized models[75]. These developments underscore the effectiveness of both task and instruction tuning as strategies for enhancing generalized foundation models with domain-specific knowledge to address specific challenges in drug discovery.

This paper is available on arxiv under CC BY 4.0 DEED license.