Efficient Learning and Planning Within the Dyna Framework

Jing Peng; Williams, Ronald, J.

Смотреть

Весь архив
Текущую коллекцию

Главная
Коллекции, полученные в рамках Государственного контракта №07.551.11.4002
Издательство SAGE Publications
Посмотреть элемент

Автор	Jing Peng
Автор	Williams, Ronald, J.
Дата выпуска	1993
dc.description	Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.
Издатель	Sage Publications
Тема	reinforcement learning
Тема	dynamic programming
Тема	sequential decision problems
Название	Efficient Learning and Planning Within the Dyna Framework
Тип	Journal Article
DOI	10.1177/105971239300100403
Print ISSN	1059-7123
Журнал	Adaptive Behavior
Том	1
Первая страница	437
Последняя страница	454
Аффилиация	Jing Peng, Northeastern University
Аффилиация	Williams, Ronald, J., Northeastern University
Выпуск	4
Библиографическая ссылка	Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (COINS Technical Report No. 91-57). Amherst, MA: Department of Computer Science, University of Massachusetts.
Библиографическая ссылка	Bertsekas, D.P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ:Prentice Hall.
Библиографическая ссылка	Bertsekas, D.P., & Tsitsiklis, J.N. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ:Prentice Hall.
Библиографическая ссылка	Holland, J.H. (1986). Escaping brittleness: The possibility of general-purpose learning algorithms applied to rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann .
Библиографическая ссылка	Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
Библиографическая ссылка	Moore, A.W., & Atkeson, C.G. (1993). Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In S. J. Hanson, J. D. Cowan, & C. L. Criles (Eds.), Advances in Neural Information Processing 5. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка	Nilsson, N.J. (1980). Principles of artificial intelligence. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка	Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Joumal of Research and Development, 3, 210-229.
Библиографическая ссылка	(Reprinted in E. A. Feigenbaum & J. Feldman [Eds.] [1963], Computers and thought . New York: McGraw-Hill.)
Библиографическая ссылка	Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
Библиографическая ссылка	Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка	Sutton, R.S. (1991). Planning by incremental dynamic programming. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
Библиографическая ссылка	Sutton, R.S., Barto, A.G., & Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine , 12, 19-22.
Библиографическая ссылка	Tesauro, G. (1992). Practical issues in temporal difference learning . Advances in Neural Information Processing Systems, 4, 259-266.
Библиографическая ссылка	Watkins, C.J.C.H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University, Cambridge, England.
Библиографическая ссылка	Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.
Библиографическая ссылка	Williams, R.J., & Baird, L.C., III (1990). A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems. New Haven, CT: Yale University Center for Systems Science.

Читать

850.5Кб

Скрыть метаданые

Смотреть

Весь архив

Текущую коллекцию