Автор |
Löbus, J.U. |
Дата выпуска |
1986 |
dc.description |
A class of iteration methods is introduced to find the optimal value function υ<sup>*</sup> ∈ R <sup>m</sup> in a Markovian decision problem with known optimal stationary policy, represented by a (m, m)-transition-matrix P <sup>δ</sup> and a reward vector γ<sup>δ</sup>∈R <sup>m</sup> . Depending on a (m, m)-para-meter-matrix Q, (I - Q) nonsingular, the Q-iteration (for υ<sup>*</sup>, when P <sup>δ</sup> and γ <sup>δ</sup> are presented) is explained. Well-known methods are received for special forms of Q. An estimation characterizing the speed of convergence of the Q-iteration is given. |
Формат |
application.pdf |
Издатель |
Akademic-Verlag |
Копирайт |
Copyright Taylor and Francis Group, LLC |
Тема |
Markovian decision problem |
Тема |
optimal value function |
Тема |
iteration methods |
Тема |
Primary: 90 C 40 |
Тема |
Secondary: 49 C 20 |
Название |
A class of procedures to compute the optimal value f unction in a Markovian decision problem |
Тип |
research-article |
DOI |
10.1080/02331938608843148 |
Electronic ISSN |
1029-4945 |
Print ISSN |
0233-1934 |
Журнал |
Optimization |
Том |
17 |
Первая страница |
399 |
Последняя страница |
409 |
Аффилиация |
Löbus, J.U.; Sektion Mathematik, Friedrich-Schiller-Universität Jena |
Выпуск |
3 |
Библиографическая ссылка |
Bartmanh, D. Acceleration of the Method of Successive Approximations in Dynamic Programming, Technical University Munich. Preprint |
Библиографическая ссылка |
Hinderer, K. 1970. Foundation of Non-stationary Dynamic Programming with Discrete Time Parameter, Berlin: Springer. |
Библиографическая ссылка |
Van Nunek, J.A.E.E. 1976. Contracting Markov Decision Processes, Amsterdam: Mathematisch Centrum. |
Библиографическая ссылка |
Varga, R.S. 1962. Matrix Iterativ Analysis, Prentice-Hall. |