New Story

What Happens When Language Models Say 'Nothing'?

by Constrain TechApril 8th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

We propose a new congruence and learning algorithm to analyze constrained neural text generation using quotient probabilistic automata.
featured image - What Happens When Language Models Say 'Nothing'?
Constrain Tech HackerNoon profile picture
0-item

Authors:

(1) M. Carrasco, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(2) F. Mayr, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(3) S. Yovine, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(4) J. Kidd, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(5) M. Iturbide, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(6) J. da Silva, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(7) A. Garat, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay.

Abstract and 1 Introduction

2 Language models

3 Learning algorithm

4 Analyzing large language models

5 Conclusions. Acknowledgements, and References

A. Proof of Proposition 2.1

B. Proof of Proposition 2.2

C. Proof of Proposition 2.3

D. Proof of Proposition 2.4

ABSTRACT

We define a congruence that copes with null next-symbol probabilities that arise when the output of a language model is constrained by some means during text generation. We develop an algorithm for efficiently learning the quotient with respect to this congruence and evaluate it on case studies for analyzing the statistical properties of LLM.


1 Introduction

Many works have studied neural language models, such as Recurrent Neural Networks (RNN) and Transformers, through the analysis of surrogate automata of different sorts obtained from the former in a variety of ways, with the purpose of verifying or explaining their behavior (e.g. [12, 13, 4, 6, 8]). A few have proposed to somehow compose neural language models with automata or regular expressions in order to verifying properties on-the-fly while learning ([7]), assessing the existence of memorization, bias, or toxicity ([5]), and guiding text generation ([14]).


In this paper, we first study theoretical questions that arise when applying this last approach in the context of active learning of probabilistic deterministic finite automata (PDFA) ([11]). In Sec. 2, we address the question of dealing with null next-symbol probabilities that appear when constraining the output of a language model by composing it with an automaton and/or a sampling strategy, such as the top k most likely symbols. We do this by defining an appropriate congruence that induces a quotient PDFA without 0-probability transitions. In Sec. 3, we adapt the learning algorithm of [6] to efficiently learn the quotient PDFA. In Sec. 4, we discuss issues that arise when analyzing real large language models, in particular the role of tokenizers, and apply the algorithm on problems discussed in [5, 14] when generating text with GPT2. Experimental results show the interest of our approach.


2 Language models

Let Σ be a finite set of symbols, Σ ∗ the set of finite strings, λ ∈ Σ ∗ the empty string, and Σ$ ≜ Σ ∪ {$}, where $ P ̸∈ Σ is a special symbol used to denote termination. The probability simplex over Σ$ is ∆(Σ$) ≜ {ρ : Σ → R+ | σ∈Σ ρ(σ) = 1}. The support of ρ ∈ ∆(Σ$) is supp(ρ) ≜ {σ ∈ Σ | ρ(σ) > 0}. A language model is a total function L : Σ∗ → ∆(Σ$).


Language models can be expressed in different ways, e.g., RNN, Transformers, and PDFA. Following [6], we define a PDFA A over Σ as a tuple (Q, qin, π, τ ), where Q is a finite set of states, qin ∈ Q is the initial state, π : Q → ∆(Σ$), and τ : Q × Σ → Q. Both π and τ are total functions. We define τ ∗ and π ∗ as follows: τ ∗ (q, λ) ≜ q and τ ∗ (q, σu) ≜ τ ∗ (τ (q, σ), u), and π ∗ (q, u) ≜ π(τ ∗ (q, u)). We omit the state if it is qin and write τ ∗ (u) and π ∗ (u).


A defines the language model such that A(u) ≜ π ∗ (u). Fig. 1 gives examples of PDFA. The number below q is the probability of termination π(q)($), and the one associated with an outgoing transition labeled σ corresponds to π(q)(σ).








Congruences P is used in [1, 11] to define the equivalence relation ≡ in Σ ∗ which is a congruence with respect concatenating a symbol:













This paper is available on arxiv under CC BY-SA 4.0 by Deed (Attribution-Sharealike 4.0 International) license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks