Researchers Build AI Knowledge Graph That Sifts Through Science Papers For You

Authors:

(1) Yanpeng Ye, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, and these authors contributed equally to this work;

(2) Jie Ren, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong, China, and these authors contributed equally to this work;

(3) Shaozhou Wang, GreenDynamics Pty. Ltd, Kensington, NSW, Australia ([email protected]);

(4) Yuwei Wan, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China;

(5) Imran Razzak, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia;

(6) Tong Xie, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]);

(7) Wenjie Zhang, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]).

Editor’s note: This article is part of a broader study. You’re reading Part 2 of 9. Read the rest below.

Table of Links

Methods

Figure 1(a) illustrates the comprehensive workflow of our research. Through NER and RE tasks, we extract the structural information about catalyst, battery and solar cell. After ER and normalization, we integrate information from these three fields and construct a knowledge graph. Specifically, the pipeline displayed in the Figure 1(b), commencing with the manual annotation and normalization of the initial training data set, prompting the fine-tuned LLMs specifically for NER and RE tasks. This inference dataset is subsequently divided into ten batches, a crucial step for the iterative process that follows. Then, we finish ER task through the NLP technology including ChemDataExactor[22], mat2vec[23] and our expert dictionary. After ER, high-quality results are meticulously selected to augment the training set, thereby enhancing the model’s performance in subsequent iterations. Finally, the knowledge graph is constructed using the triples transferred from normalizd result after the last iteration.

This paper is available on arxiv under CC BY 4.0 DEED license.