Автор |
Jing Peng |
Автор |
Williams, Ronald, J. |
Дата выпуска |
1993 |
dc.description |
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. |
Издатель |
Sage Publications |
Тема |
reinforcement learning |
Тема |
dynamic programming |
Тема |
sequential decision problems |
Название |
Efficient Learning and Planning Within the Dyna Framework |
Тип |
Journal Article |
DOI |
10.1177/105971239300100403 |
Print ISSN |
1059-7123 |
Журнал |
Adaptive Behavior |
Том |
1 |
Первая страница |
437 |
Последняя страница |
454 |
Аффилиация |
Jing Peng, Northeastern University |
Аффилиация |
Williams, Ronald, J., Northeastern University |
Выпуск |
4 |
Библиографическая ссылка |
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (COINS Technical Report No. 91-57). Amherst, MA: Department of Computer Science, University of Massachusetts. |
Библиографическая ссылка |
Bertsekas, D.P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ:Prentice Hall. |
Библиографическая ссылка |
Bertsekas, D.P., & Tsitsiklis, J.N. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ:Prentice Hall. |
Библиографическая ссылка |
Holland, J.H. (1986). Escaping brittleness: The possibility of general-purpose learning algorithms applied to rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann . |
Библиографическая ссылка |
Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann . |
Библиографическая ссылка |
Moore, A.W., & Atkeson, C.G. (1993). Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In S. J. Hanson, J. D. Cowan, & C. L. Criles (Eds.), Advances in Neural Information Processing 5. San Mateo, CA: Morgan Kaufmann. |
Библиографическая ссылка |
Nilsson, N.J. (1980). Principles of artificial intelligence. San Mateo, CA: Morgan Kaufmann. |
Библиографическая ссылка |
Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Joumal of Research and Development, 3, 210-229. |
Библиографическая ссылка |
(Reprinted in E. A. Feigenbaum & J. Feldman [Eds.] [1963], Computers and thought . New York: McGraw-Hill.) |
Библиографическая ссылка |
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44. |
Библиографическая ссылка |
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann. |
Библиографическая ссылка |
Sutton, R.S. (1991). Planning by incremental dynamic programming. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann . |
Библиографическая ссылка |
Sutton, R.S., Barto, A.G., & Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine , 12, 19-22. |
Библиографическая ссылка |
Tesauro, G. (1992). Practical issues in temporal difference learning . Advances in Neural Information Processing Systems, 4, 259-266. |
Библиографическая ссылка |
Watkins, C.J.C.H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University, Cambridge, England. |
Библиографическая ссылка |
Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292. |
Библиографическая ссылка |
Williams, R.J., & Baird, L.C., III (1990). A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems. New Haven, CT: Yale University Center for Systems Science. |