Relational knowledge and representation for reinforcement learning
Abstract
In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order
to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure,
allowing the agent to learn in an abstraction of the original problem which is of
considerably smaller scale. One approach is to learn the Q-values of actions which
are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features
which consist of only variables, thereby aggregating similar states from all problems
of the same domain to abstract states. We study the limitations of RRL due to
this abstraction and introduce the concepts of consistent abstraction, subsumption
of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve.
Next, to further improve the learning efficiency, we propose to learn different types
of generalised knowledge. The policy is influenced by directed exploration based on
multiple types of intrinsic rewards and avoids previously encountered dead ends. In
addition, we incorporate model-based techniques to provide better quality estimates
of the Q-values. Transfer learning is possible by directly leveraging the generalised
knowledge to accelerate learning in a new problem. Lastly, we introduce a new class
of problems which considers dynamic objects and time-bounded goals. We discuss
the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by
time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy
in solving large scale problems and transfer learning.