Joonkyu Min

I am an undergraduate student at Seoul National University, majoring in Electrical and Computer Engineering. I am currently an intern at CLVR Lab, where I work on robot safety under supervision of Joseph J. Lim.

Email  /  GitHub  /  LinkedIn  / 


What is a value function?

April 11, 2025

In RL, we try to maximize the long-term collected discounted rewards, or return.

\[G^\pi_t = \sum^\infty_{i=0}\gamma^i r_{t+i+1} \le \frac{\sup r}{1-\gamma}\]

Value in MDP is the expected return of a particular state, following a given policy.

There are typically two value functions, state value function $V$, and state-action value function $Q$ .

  • State value function

State value function is a mapping from state $s$ to the expected return starting from $s$, following a particular policy $\pi$ .

\[V^\pi(s)=\mathbb{E}^\pi[G_t|s_{t}=s]\]
  • State action value function

State-action value function, or Q-function, is a mapping from state $s$ and action $a$ to the expected return starting from state $s$ and action $a$, following a particular policy $\pi$ .

\[Q^\pi(s)=\mathbb{E}^\pi[G_t|s_{t}=s,a_{t}=a]\]

Due to the Markovian property, value function holds the following 1-step transition property.

\[\begin{align} V^\pi(s)&=\mathbb{E}_{a\sim \pi(\cdot|s), s'\sim p(\cdot|s,a)}[r+\gamma V^\pi(s')|s_{0}=s] \\ Q^\pi(s)&=\mathbb{E}_{a'\sim \pi(\cdot|s'), s'\sim p(\cdot|s,a)}[r+\gamma Q^\pi(s',a')|s_{0}=s,a_{0}=a] \end{align}\]

This leads to the famous Bellman equation.


Reference

R. Sutton, A. Barto, Reinforcement Learning: An Introduction. The MIT Press, 2018.


Design and source code from Jon Barron's website