Relational knowledge and representation for reinforcement learning
Ng Jun Hao, Alvin
MetadataShow full item record
In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure, allowing the agent to learn in an abstraction of the original problem which is of considerably smaller scale. One approach is to learn the Q-values of actions which are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features which consist of only variables, thereby aggregating similar states from all problems of the same domain to abstract states. We study the limitations of RRL due to this abstraction and introduce the concepts of consistent abstraction, subsumption of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve. Next, to further improve the learning efficiency, we propose to learn different types of generalised knowledge. The policy is influenced by directed exploration based on multiple types of intrinsic rewards and avoids previously encountered dead ends. In addition, we incorporate model-based techniques to provide better quality estimates of the Q-values. Transfer learning is possible by directly leveraging the generalised knowledge to accelerate learning in a new problem. Lastly, we introduce a new class of problems which considers dynamic objects and time-bounded goals. We discuss the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy in solving large scale problems and transfer learning.