Мобильная версия

Доступно журналов:

3 288

Доступно статей:

3 891 637

 

Скрыть метаданые

Автор Jing Peng
Автор Williams, Ronald, J.
Дата выпуска 1993
dc.description Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.
Издатель Sage Publications
Тема reinforcement learning
Тема dynamic programming
Тема sequential decision problems
Название Efficient Learning and Planning Within the Dyna Framework
Тип Journal Article
DOI 10.1177/105971239300100403
Print ISSN 1059-7123
Журнал Adaptive Behavior
Том 1
Первая страница 437
Последняя страница 454
Аффилиация Jing Peng, Northeastern University
Аффилиация Williams, Ronald, J., Northeastern University
Выпуск 4
Библиографическая ссылка Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (COINS Technical Report No. 91-57). Amherst, MA: Department of Computer Science, University of Massachusetts.
Библиографическая ссылка Bertsekas, D.P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ:Prentice Hall.
Библиографическая ссылка Bertsekas, D.P., & Tsitsiklis, J.N. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ:Prentice Hall.
Библиографическая ссылка Holland, J.H. (1986). Escaping brittleness: The possibility of general-purpose learning algorithms applied to rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann .
Библиографическая ссылка Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
Библиографическая ссылка Moore, A.W., & Atkeson, C.G. (1993). Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In S. J. Hanson, J. D. Cowan, & C. L. Criles (Eds.), Advances in Neural Information Processing 5. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка Nilsson, N.J. (1980). Principles of artificial intelligence. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Joumal of Research and Development, 3, 210-229.
Библиографическая ссылка (Reprinted in E. A. Feigenbaum & J. Feldman [Eds.] [1963], Computers and thought . New York: McGraw-Hill.)
Библиографическая ссылка Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
Библиографическая ссылка Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Библиографическая ссылка Sutton, R.S. (1991). Planning by incremental dynamic programming. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
Библиографическая ссылка Sutton, R.S., Barto, A.G., & Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine , 12, 19-22.
Библиографическая ссылка Tesauro, G. (1992). Practical issues in temporal difference learning . Advances in Neural Information Processing Systems, 4, 259-266.
Библиографическая ссылка Watkins, C.J.C.H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University, Cambridge, England.
Библиографическая ссылка Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.
Библиографическая ссылка Williams, R.J., & Baird, L.C., III (1990). A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems. New Haven, CT: Yale University Center for Systems Science.

Скрыть метаданые