A class of procedures to compute the optimal value f unction in a Markovian decision problem
Löbus, J.U.; Löbus, J.U.; Sektion Mathematik, Friedrich-Schiller-Universität Jena
Журнал:
Optimization
Дата:
1986
Аннотация:
A class of iteration methods is introduced to find the optimal value function υ<sup>*</sup> ∈ R <sup>m</sup> in a Markovian decision problem with known optimal stationary policy, represented by a (m, m)-transition-matrix P <sup>δ</sup> and a reward vector γ<sup>δ</sup>∈R <sup>m</sup> . Depending on a (m, m)-para-meter-matrix Q, (I - Q) nonsingular, the Q-iteration (for υ<sup>*</sup>, when P <sup>δ</sup> and γ <sup>δ</sup> are presented) is explained. Well-known methods are received for special forms of Q. An estimation characterizing the speed of convergence of the Q-iteration is given.
410.7Кб