What is a value function?

April 11, 2025

In RL, we try to maximize the long-term collected discounted rewards, or return.

\[G^\pi_t = \sum^\infty_{i=0}\gamma^i r_{t+i+1} \le \frac{\sup r}{1-\gamma}\]

Value in MDP is the expected return of a particular state, following a given policy.

There are typically two value functions, state value function $V$, and state-action value function $Q$ .

State value function

State value function is a mapping from state $s$ to the expected return starting from $s$, following a particular policy $\pi$ .

\[V^\pi(s)=\mathbb{E}^\pi[G_t|s_{t}=s]\]

State action value function

State-action value function, or Q-function, is a mapping from state $s$ and action $a$ to the expected return starting from state $s$ and action $a$, following a particular policy $\pi$ .

\[Q^\pi(s)=\mathbb{E}^\pi[G_t|s_{t}=s,a_{t}=a]\]

Due to the Markovian property, value function holds the following 1-step transition property.

\[\begin{align} V^\pi(s)&=\mathbb{E}_{a\sim \pi(\cdot|s), s'\sim p(\cdot|s,a)}[r+\gamma V^\pi(s')|s_{0}=s] \\ Q^\pi(s)&=\mathbb{E}_{a'\sim \pi(\cdot|s'), s'\sim p(\cdot|s,a)}[r+\gamma Q^\pi(s',a')|s_{0}=s,a_{0}=a] \end{align}\]

This leads to the famous Bellman equation.

Reference

R. Sutton, A. Barto, Reinforcement Learning: An Introduction. The MIT Press, 2018.

Joonkyu Min

What is a value function?