AI Model Reads Thousands of Studies, Nails Battery Science Better Than Expected

Authors:

(1) Yanpeng Ye, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, and these authors contributed equally to this work;

(2) Jie Ren, GreenDynamics Pty. Ltd, Kensington, NSW, Australia, Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong, China, and these authors contributed equally to this work;

(3) Shaozhou Wang, GreenDynamics Pty. Ltd, Kensington, NSW, Australia ([email protected]);

(4) Yuwei Wan, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and Department of Linguistics and Translation, City University of Hong Kong, Hong Kong, China;

(5) Imran Razzak, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia;

(6) Tong Xie, GreenDynamics Pty. Ltd, Kensington, NSW, Australia and School of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]);

(7) Wenjie Zhang, School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia ([email protected]).

Editor’s note: This article is part of a broader study. You’re reading Part 7 of 9. Read the rest below.

Table of Links

Result

In Table 1, the performance comparison of LLMs includes Darwin, LLaMA, and LLaMA2 in NER, RE and ER task within the materials science domain is presented. Darwin achieves markedly higher F1 scores in both NER and RE tasks than the LLaMA 7b and LLaMA2 7b models. This indicates that Darwin can perform more satisfactory results in text tasks in the field of materials science. However, there has been no significant difference in their effectiveness for ER tasks between these models. We speculate that this is due to insufficient contextual memory capabilities of the LLMs.

In response to the challenge, entity resolution was applied to the inference outputs, which substantially improved the performance in Standardization (ER) tasks. This refinement process has also led to enhancements in the NER and Relation RE tasks as well. The improvements can be attributed to the ER task removing some of the incorrect inferences that resulted from hallucinations and providing a secondary adjustment of the extracted relations.

To gain a more comprehensive understanding of the improvements brought by each step in the standardization process, we conducted ablation experiments, the results are shown in Table 2 which clearly shows that each method contributes to the improvement of ER. The result indicates the most effective method is to optimize through the expert dictionary, which indicates that in addition to the accuracy of the expert dictionary, it can also fully cover all other labels except for the three labels representing the material. The improvement of ER-NF/A is also significant, as this process can eliminate most erroneous materials, and even has a stronger improvement on NER than ER-OL.

Besides, ER-N/F pay attention to adjust the relation error between "Name" and "Formula", from a contribution perspective, it is also an indispensable part of ER. We also evaluated the LLM classifier by selecting 200 manually labeled data, and the statistical evaluation results in Table 3 shows the strong performance of LLM used in classification task.

The triples are transferred from normalized data to construct the FMKG which contains 162,605 nodes and 731,772 edges, the schematic diagram shows in Figure 3. In the graph, there are 11 types of nodes and 13 types of relations. The FMKG store large amount of information in functional materials includes battery, catalyst and solar cell. For example, as the Figure 4 shows, by analyzing the frequency of material appearances in the battery domain, we observe that Co2O3 is the most frequently occurring material, followed by MoS2, graphite, TiO2, LiCoO2 etc which are commonly used in the battery domain. Likewise, it is evident that the most prevalent application within the battery field is lithium-ion batteries, which appear with a frequency significantly higher than other applications in the battery domain. These insights align with our understanding of these domains and materials, confirming that our knowledge graph effectively stores factual knowledge and offers a knowledge platform for material researchers.

For further validation of FMKG, we extract the 500 triples are randomly extracted except the "DOI" and "Domain" nodes, which are informative information, and checked by the experts in relevant material science. After splitting these triples apart, annotator can obtain 1000 entities and 500 relations. The report from annotator shows in Table 4.

“Name” is the highest priority core label and only serves as the head in the knowledge graph, so it will not be used as a relation. Notably, the labels "Application", "Structure/Phase", "Synthesis" and "Characterization" exhibit 100% accuracy in both entity and relation. This is due to the rigorous standardization of entities under these categories, using our expert dictionary. In contrast, the "Descriptor" and "Property", characterized by their vast diversity and broad spectrum, are applied a less stringent standardization process which lead to a certain degree of imprecision in entity and relation. But the results are still satisfactory.

"Name" and "Acronym" has also achieved a commendable 100% entity accuracy. However, when it comes to analyzing relations, the ChemDataExtractor encounters certain limitations. A detailed analysis of "Name" and "Acronym" misclassifications reveals that most of these wrong entities originate from "Formula", which is due to the reasonable error of LLM"s binary classification of "Formula" and "Name" and ChemDataExtractor. The impact of this kind of error is not significant, as fundamentally, the "Name", "Formula" and "Acronym" from same source all represent the same material.

This paper is available on arxiv under CC BY 4.0 DEED license.