Leveraging Natural Supervision: Learning Semantic Knowledge from Wikipedia

Written by textmodels | Published 2024/06/01
Tech Story Tags: llm-natural-supervision | llm-self-supervision | llm-language-pretraining | llm-concept-hierarchies | llm-entity-representations | wikipedia-semantic-knowledge | nlp-for-wikipedia | contextualized-representations

TLDRIn this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.via the TL;DR App

Author:

(1) Mingda Chen.

Table of Links

CHAPTER 4 - LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA

In this chapter, we describe our contributions to exploiting rich, naturally-occurring structures on Wikipedia for various NLP tasks. In Section 4.1, we use hyperlinks to learn entity representations. The resultant models use contextualized representations rather than a fixed set of vectors for representing entities (unlike most prior work). In Section 4.2, we use article structures (e.g., paragraph positions and section titles) to make sentence representations aware of the broader context in which they situate, leading to improvements across various discourse-related tasks. In Section 4.3, we use article category hierarchies to learn concept hierarchies that improve model performance on textual entailment tasks.

The material in this chapter is adapted from Chen et al. (2019a), Chen et al. (2019b), and Chen et al. (2020a).

This paper is available on arxiv under CC 4.0 license.


Written by textmodels | We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.
Published by HackerNoon on 2024/06/01