| Автор | Löbus, J.U. |
| Дата выпуска | 1986 |
| dc.description | A class of iteration methods is introduced to find the optimal value function υ<sup>*</sup> ∈ R <sup>m</sup> in a Markovian decision problem with known optimal stationary policy, represented by a (m, m)-transition-matrix P <sup>δ</sup> and a reward vector γ<sup>δ</sup>∈R <sup>m</sup> . Depending on a (m, m)-para-meter-matrix Q, (I - Q) nonsingular, the Q-iteration (for υ<sup>*</sup>, when P <sup>δ</sup> and γ <sup>δ</sup> are presented) is explained. Well-known methods are received for special forms of Q. An estimation characterizing the speed of convergence of the Q-iteration is given. |
| Формат | application.pdf |
| Издатель | Akademic-Verlag |
| Копирайт | Copyright Taylor and Francis Group, LLC |
| Тема | Markovian decision problem |
| Тема | optimal value function |
| Тема | iteration methods |
| Тема | Primary: 90 C 40 |
| Тема | Secondary: 49 C 20 |
| Название | A class of procedures to compute the optimal value f unction in a Markovian decision problem |
| Тип | research-article |
| DOI | 10.1080/02331938608843148 |
| Electronic ISSN | 1029-4945 |
| Print ISSN | 0233-1934 |
| Журнал | Optimization |
| Том | 17 |
| Первая страница | 399 |
| Последняя страница | 409 |
| Аффилиация | Löbus, J.U.; Sektion Mathematik, Friedrich-Schiller-Universität Jena |
| Выпуск | 3 |
| Библиографическая ссылка | Bartmanh, D. Acceleration of the Method of Successive Approximations in Dynamic Programming, Technical University Munich. Preprint |
| Библиографическая ссылка | Hinderer, K. 1970. Foundation of Non-stationary Dynamic Programming with Discrete Time Parameter, Berlin: Springer. |
| Библиографическая ссылка | Van Nunek, J.A.E.E. 1976. Contracting Markov Decision Processes, Amsterdam: Mathematisch Centrum. |
| Библиографическая ссылка | Varga, R.S. 1962. Matrix Iterativ Analysis, Prentice-Hall. |