paint-brush
Researchers Learn to Measure AI’s Language Skills by@morphology
113 reads

Researchers Learn to Measure AI’s Language Skills

by MorphologyDecember 30th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers in Poland have developed an open-source tool that improves the evaluation and comparison of AI used in natural language preprocessing.
featured image - Researchers Learn to Measure AI’s Language Skills
Morphology HackerNoon profile picture

Authors:

(1) Martyna Wiącek, Institute of Computer Science, Polish Academy of Sciences;

(2) Piotr Rybak, Institute of Computer Science, Polish Academy of Sciences;

(3) Łukasz Pszenny, Institute of Computer Science, Polish Academy of Sciences;

(4) Alina Wróblewska, Institute of Computer Science, Polish Academy of Sciences.

Editor's note: This is Part 7 of 10 of a study on improving the evaluation and comparison of tools used in natural language preprocessing. Read the rest below.

Abstract and 1. Introduction and related works

  1. NLPre benchmarking

2.1. Research concept

2.2. Online benchmarking system

2.3. Configuration

  1. NLPre-PL benchmark

3.1. Datasets

3.2. Tasks

  1. Evaluation

4.1. Evaluation methodology

4.2. Evaluated systems

4.3. Results

  1. Conclusions
    • Appendices
    • Acknowledgements
    • Bibliographical References
    • Language Resource References

4. Evaluation

4.1. Evaluation methodology

To maintain the de facto standard to NLPre evaluation, we apply the evaluation measures defined for the CoNLL 2018 shared task and implemented in the official evaluation script.[11] In particular, we focus on F1 and AlignedAccuracy, which is similar to F1 but does not consider possible misalignments in tokens, words, or sentences.


In our evaluation process, we follow default training procedures suggested by the authors of the evaluated systems, i.e. we do not conduct any optimal hyperparameter search in favour of leaving the recommended model configuration as-is. We also do not further fine-tune selected models.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


[11] https://universaldependencies.org/conll18/conll18_ud_eval.py