paint-brush
Anc-VI Sets New Standards in Speed for Bellman Consistency in Reinforcement Learningby@anchoring
New Story

Anc-VI Sets New Standards in Speed for Bellman Consistency in Reinforcement Learning

by AnchoringJanuary 14th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Anc-VI accelerates Bellman consistency, achieving faster convergence rates for reinforcement learning, particularly when the discount factor is close to 1, outpacing standard value iteration.
featured image - Anc-VI Sets New Standards in Speed for Bellman Consistency in Reinforcement Learning
Anchoring HackerNoon profile picture
0-item

Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gauss–Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

2.1 Accelerated rate for Bellman consistency operator

First, for general state-action spaces, we present the accelerated convergence rate of Anc-VI for the Bellman consistency operator.




This paper is available on arxiv under CC BY 4.0 DEED license.


[1] Arguably, T π is affine, not linear, but we follow the convention of [69] say T π is linear.