Advancing Biomedical Text Mining with Community Challenges

by Text MiningApril 23rd, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This review aims to provide a comprehensive landscape for the recent advances in community challenges of Chinese biomedical text mining.

People Mentioned

Mention Thumbnail
featured image - Advancing Biomedical Text Mining with Community Challenges
Text Mining HackerNoon profile picture
0-item

Abstract and 1. Introduction

2. Community Challenges Overview and 2.1 CCKS

2.2 CHIP and 2.3 CCIR, CSMI, CCL and DCIC

3. Evaluation Tasks Overview and 3.1 Information Extraction

3.2 Text Classification and Text Similarity

3.3 Knowledge Graph and Question Answering

3.4 Text Generation and Knowledge Reasoning and 3.5 Large Language Model Evaluation

4. Translational Informatics in Biomedical Text Mining

5. Discussion and Perspective

5.1. Contributions of Community Challenges

5.2. Limitations of Current Community Challenges

5.3. Future Perspectives in the Era of Large Language Models, and References

Figure Legends and Tables

Abstract

The field of biomedical research has witnessed a significant increase in the accumulation of vast amounts of textual data from various sources such as scientific literatures, electronic health records, clinical trial reports, and social media. However, manually processing and analyzing these extensive and complex resources is time-consuming and inefficient. To address this challenge, biomedical text mining, also known as biomedical natural language processing, has garnered great attention. Community challenge evaluation competitions have played an important role in promoting technology innovation and interdisciplinary collaboration in biomedical text mining research. These challenges provide platforms for researchers to develop state-of-the-art solutions for data mining and information processing in biomedical research. In this article, we review the recent advances in community challenges specific to Chinese biomedical text mining. Firstly, we collect the information of these evaluation tasks, such as data sources and task types. Secondly, we conduct systematic summary and comparative analysis, including named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation. Then, we summarize the potential clinical applications of these community challenge tasks from translational informatics perspective. Finally, we discuss the contributions and limitations of these community challenges, while highlighting future directions in the era of large language models.

1. Introduction

Over the past few decades, the field of biomedical research has witnessed a remarkable growth in the accumulation of extensive amounts of textual data[1, 2]. These data come from various sources with a substantial volume, including scientific literatures, electronic health records, clinical trial reports, social media platforms, books, patents, and more. These data contain rich information that can be leveraged for knowledge discovery [3, 4], hypothesis generation[5, 6] and clinical practice[7, 8]. However, due to the vast amount and complexity of textual resources, manual reading and processing of data are time-consuming, labor-intensive and inefficient. Researchers and clinicians are facing the challenges of information explosion and knowledge emergence. As a result, there is a need in the development of efficient computational techniques for health information processing.


Biomedical text mining, also known as biomedical natural language processing (BioNLP) has gained significant attention [9-13]. BioNLP can extract key biological entities (such as variant, gene, protein, and disease) [14-16] and medical entities (such as treatment, surgery, and drug) [17-20], identify entity relationships [21-24], perform document classification [25, 26], information retrieval [27], and knowledge question answering [28], among other tasks. BioNLP techniques have found extensive applications in scientific research and clinical practice. For instance, the identification of new drug targets [29], the discovery of novel therapeutic interventions [30, 31], the exploration of adverse drug reactions [32], and the cohort building of clinical trials [33-35]. BioNLP techniques can also facilitate the construction of knowledge bases and ontologies[36-38], enabling efficient data integration and interoperability across different sources [39]. Furthermore, cutting-edge large language models have demonstrated remarkable applications in the fields of biomedical research and healthcare [40- 42].


Community challenge evaluation competitions have emerged as crucial catalysts for promoting technological innovation and interdisciplinary collaboration in the field of BioNLP. These challenges provide platforms for researchers, data scientists, and domain experts to showcase their expertise, exchange ideas, and develop state-of-the-art solutions for data mining and information processing in the biomedical research [43]. By providing manually curated datasets and specifically defined tasks, BioNLP community challenge evaluations foster the development of robust algorithms, novel methods, and benchmark frameworks. In the past few decades, multiple renowned community challenges, including BioCreative [22, 26], TREC [44], and i2b2 [34, 45], have made significant contributions to advancing biomedical text mining technology. Similarly, China has witnessed numerous community challenges aimed at addressing biomedical and health information processing problems specific to the Chinese language [46-52]. However, there has been a lack of systematic summaries and comparative analyses of these community challenges to date. In addition, with the rapid development of large language models, such as ChatGPT, the field of biomedical text mining faces new opportunities and requirements for future organization of community challenges.


This review aims to provide a comprehensive landscape for the recent advances in community challenges of Chinese biomedical text mining. We first collect evaluation shared tasks organized by academic conference, and conduct systematic summary and comparative analysis for specific tasks, including data sources and task types. Then, we summarize the potential clinical applications of these community challenge tasks from translational informatics perspective. Finally, we discuss the contributions and limitations of these community challenge tasks, while highlighting future directions in the era of large language models.


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Hui Zong, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(2) Rongrong Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(3) Jiaxue Cha, Shanghai Key Laboratory of Signaling and Disease Research, Laboratory of Receptor-Based Bio-Medicine, Collaborative Innovation Center for Brain Science, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China;

(4) Erman Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China;

(5) Jiakun Li, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and Department of Urology, West China Hospital, Sichuan University, Chengdu, 610041, China;

(6) Liang Tao, Faculty of Business Information, Shanghai Business School, Shanghai, 201400, China;

(7) Zuofeng Li, Takeda Co. Ltd., Shanghai, 200040, China;

(8) Buzhou Tang, Department of Computer Science, Harbin Institute of Technology, Shenzhen, 518055, China;

(9) Bairong Shen, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and a Corresponding author.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks