Authors:
(1) Yanpeng Ye, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, and these authors contributed equally to this work;
(2) Jie Ren, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong, China, and these authors contributed equally to this work;
(3) Shaozhou Wang, GreenDynamics Pty. Ltd, Kensington, NSW, Australia ([email protected]);
(4) Yuwei Wan, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China;
(5) Imran Razzak, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia;
(6) Tong Xie, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]);
(7) Wenjie Zhang, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]).
Editor’s note: This article is part of a broader study. You’re reading Part 4 of 9. Read the rest below.
Table of Links
- Abstract and Introduction
- Methods
- Data preparation and schema design
- LLMs training, evaluation and inference
- Entity resolution
- Knowledge graph construction
- Result
- Discussion
- Conclusion and References
LLMs training, evaluation and inference
The training dataset, composed of compiled data, was employed to fine-tune models including LLaMA7b, LLaMA2 7b, and Darwin[24]. Upon obtaining high-quality results from the normalized inference, we iteratively retrain a fine-tuned LLM for infer subsequent batches of data. The models underwent training over 10 epochs with a batch size of 1. Additionally, 60 abstracts were annotated to assess the LLMs" performance. Our evaluation primarily focused on the NER capabilities of the model and preliminarily explored the RE task"s ability to identify potential internal relations among relevant element sets. Moreover, since we standardized entities during the data compilation phase, we also assessed the LLMs" effectiveness in standardizing the entities. Specifically, for NER, RE and ER task, we employ a unified framework for evaluation based on precision, recall, and the F1 score, taking into account the instances of false positives (fp) and false negatives (fn) to quantify the performance.
For NER, a true positive (tp) is a correctly identified entity, while for RE, it is a correctly identified relationship between entities, and for ER, it is a correctly standardized entity according to the schema. Conversely, a false positive (fp) occurs when the model incorrectly identifies an entity, relation, or standardization, and a false negative (fn) is when the model fails to identify a correct entity, relation, or schema element that should have been recognized. After definition, we can evaluate each task using Equations (1), (2), and (3).
After evaluation, we choose a fine-tuned LLM, which demonstrates optimal performance in both NER and RE task, to iteratively infer the 150,000 abstracts. The output of inferences are organized in the "DOI - text - response" format. Consequently, the fine-tuned LLM not only extracts entities but also assigns them to appropriate labels, thereby accomplishing NER and RE tasks concurrently. Moreover, every entity and relation identified in the response is traceable, enhancing the integrity and utility of the data.
This paper is available on arxiv under CC BY 4.0 DEED license.