What is Value Iteration?
April 13, 2025
In classical dynamic programming (DP) for reinforcement learning with a discrete MDP, value iteration is a method used to compute the optimal value function.
By repeatedly applying the Bellman optimality operator to the value function, and assuming $0 < \gamma < 1$, the value function converges to the true optimal value function $V^*$.
Although value iteration cannot be directly applied to real-world problems with large or continuous state spaces, it provides a background for modern deep reinforcement learning methods, such as Deep Q-Networks (DQN).
Reference
R. Sutton, A. Barto, Reinforcement Learning: An Introduction. The MIT Press, 2018.