A Faster Path to Smarter AI: The New Anc-VI Method

Written by anchoring | Published 2025/01/14
Tech Story Tags: reinforcement-learning | dynamic-programming | nesterov-acceleration | machine-learning-optimization | value-iteration | value-iteration-convergence | bellman-error | accelerated-value-iteration

TLDR Anc-VI is a new accelerated value iteration method that improves the convergence rate of reinforcement learning models, achieving faster Bellman error reduction compared to standard VI.via the TL;DR App

Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gaussā€“Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

Abstract

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a O(Ī³ k )-rate, where Ī³ is the discount factor. Surprisingly, however, the optimal rate in terms of Bellman error for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an anchoring mechanism (distinct from Nesterovā€™s acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a O(1/k)-rate for Ī³ ā‰ˆ 1 or even Ī³ = 1, while standard VI has rate O(1) for Ī³ ā‰„ 1 āˆ’ 1/k, where k is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of 4, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gaussā€“Seidel VI setups as well.

1 Introduction

Value Iteration (VI) is foundational to the theory and practice of modern dynamic programming (DP) and reinforcement learning (RL). It is well known that when a discount factor Ī³ < 1 is used, (exact) VI is a contractive iteration in the kĀ·kāˆž-norm and therefore converges. The progress of VI is measured by the Bellman error in practice (as the distance to the fixed point is not computable), and much prior work has been dedicated to analyzing the rates of convergence of VI and its variants.

37th Conference on Neural Information Processing Systems (NeurIPS 2023).

This paper is available on arxiv under CC BY 4.0 DEED license.


Written by anchoring | Anchoring provides a steady start, grounding decisions and perspectives in clarity and confidence.
Published by HackerNoon on 2025/01/14