Deep Reinforcement Learning (DRL) agents have shown impressive ability in mastering computer games, but notoriously take a long time to learn. As an agent progresses through a game, it will often encounter new states containing previously unencountered game entities, e.g., new enemies. In such situations, DRL agents typically struggle to generalise their prior knowledge to the new entities, owing to differences in state and object representations. In particular, even when new entities behave similarly to previously encountered ones, if they appear to be different then DRL agents can take a long time to adapt. Policy transfer learning offers a promising approach for allowing DRL agents to adapt their knowledge; however, establishing the connection between the newly presented states (the target task) and previously encountered ones (the source task) requires guidance from a domain expert. Guidance in the form of externally constructed mapping of state-action pairings, must be continually maintained in response to new game entity encounters.
This thesis proposes an alternative approach, where policy transfer is accomplished by leveraging an intermediate state transformation, removing the need for manual mapping. Each entity is mapped to a unique entity ID, and when a new game entity is encountered, a “substitution agent” strives to learn a mapping between the new entity ID and a previously encountered one. For example, if the new entity is a type of enemy, the substitution agent will ideally learn to map the new ID to a previously encountered enemy’s ID, rather than, say, the ID of a powerup item. Experimental results show that this approach is effective, allowing for rapid improvement of end-of-episode scores when encountering new entity representations in the game, Infinite Mario<p></p>