Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

arXiv:2510.23914v2 Announce Type: replace
Abstract: While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate $\gamma$, while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.

Source link

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Leave a Reply Cancel reply

Recent Posts

Recent Comments

You Might Also Like

A simple chemical tweak could supercharge quantum computers

A simple hand photo may be the key to detecting a serious disease

Building Next-Gen Agentic AI: A Complete Framework for Cognitive Blueprint Driven Runtime Agents with Memory Tools and Validation

Top Telemedicine Employers For Best Jobs in Virtual Healthcare