What is Value Iteration?

April 13, 2025

In classical dynamic programming (DP) for reinforcement learning with a discrete MDP, value iteration is a method used to compute the optimal value function.

By repeatedly applying the Bellman optimality operator to the value function, and assuming $0 < \gamma < 1$, the value function converges to the true optimal value function $V^*$.

Although value iteration cannot be directly applied to real-world problems with large or continuous state spaces, it provides a background for modern deep reinforcement learning methods, such as Deep Q-Networks (DQN).

Reference

R. Sutton, A. Barto, Reinforcement Learning: An Introduction. The MIT Press, 2018.

Joonkyu Min

What is Value Iteration?