Мобильная версия

Доступно журналов:

3 288

Доступно статей:

3 891 637

 

Скрыть метаданые

Автор Löbus, J.U.
Дата выпуска 1986
dc.description A class of iteration methods is introduced to find the optimal value function υ<sup>*</sup> ∈ R <sup>m</sup> in a Markovian decision problem with known optimal stationary policy, represented by a (m, m)-transition-matrix P <sup>δ</sup> and a reward vector γ<sup>δ</sup>∈R <sup>m</sup> . Depending on a (m, m)-para-meter-matrix Q, (I - Q) nonsingular, the Q-iteration (for υ<sup>*</sup>, when P <sup>δ</sup> and γ <sup>δ</sup> are presented) is explained. Well-known methods are received for special forms of Q. An estimation characterizing the speed of convergence of the Q-iteration is given.
Формат application.pdf
Издатель Akademic-Verlag
Копирайт Copyright Taylor and Francis Group, LLC
Тема Markovian decision problem
Тема optimal value function
Тема iteration methods
Тема Primary: 90 C 40
Тема Secondary: 49 C 20
Название A class of procedures to compute the optimal value f unction in a Markovian decision problem
Тип research-article
DOI 10.1080/02331938608843148
Electronic ISSN 1029-4945
Print ISSN 0233-1934
Журнал Optimization
Том 17
Первая страница 399
Последняя страница 409
Аффилиация Löbus, J.U.; Sektion Mathematik, Friedrich-Schiller-Universität Jena
Выпуск 3
Библиографическая ссылка Bartmanh, D. Acceleration of the Method of Successive Approximations in Dynamic Programming, Technical University Munich. Preprint
Библиографическая ссылка Hinderer, K. 1970. Foundation of Non-stationary Dynamic Programming with Discrete Time Parameter, Berlin: Springer.
Библиографическая ссылка Van Nunek, J.A.E.E. 1976. Contracting Markov Decision Processes, Amsterdam: Mathematisch Centrum.
Библиографическая ссылка Varga, R.S. 1962. Matrix Iterativ Analysis, Prentice-Hall.

Скрыть метаданые