A Hierarchical Network of Provably Optimal Learning Control Systems: Extensions of the Associative Control Process (ACP) Network
Baird Leemon C.; Klopf A. Harry
Журнал:
Adaptive Behavior
Дата:
1993-01-01
Аннотация:
An associative control process (ACP) network is a learning control system that
can reproduce a variety of animal learning results from classical and
instrumental conditioning experiments (Klopf, Morgan, & Weaver, 1993; see
also the article, 'A Hierarchical Network of Control Systems that Learn"). The
ACP networks proposed and tested by Klopf, Morgan, and Weaver are not
guaranteed, however, to learn optimal policies for maximizing reinforcement.
Optimal behavior is guaranteed for a reinforcement learning system such as
Q-learning (Watkins, 1989), but simple Q-learning is incapable of reproducing
the animal learning results that ACP networks reproduce. We propose two new
models that reproduce the animal learning results and are provably optimal.
The first model, the modified ACP network, embodies the smallest number of
changes necessary to the ACP network to guarantee that optimal policies will be
learned while still reproducing the animal learning results. The second model,
the single-layer ACP network, embodies the smallest number of changes
necessary to Q-learning to guarantee that it reproduces the animal learning
results while still learning optimal policies. We also propose a hierarchical
network architecture within which several reinforcement learning systems (e.g.,
Q-learning systems, single-layer ACP networks, or any other learning
controller) can be combined in a hierarchy. We implement the hierarchical
network architecture by combining four of the single-layer ACP networks to
form a controller for a standard inverted pendulum dynamic control problem.
The hierarchical controller is shown to learn more reliably and more than an
order of magnitude faster than either the single-layer ACP network or the Barto,
Sutton, and Anderson (1983) learning controller for the benchmark problem.
1.567Мб