Reinforcement Learning (RL) is a machine learning paradigm behind many successes in games, robotics and control
applications. RL agents improve through trial-and-error, therefore undergoing a learning phase during which they perform suboptimally. Research effort has been put into optimising behaviour during this period, to reduce its duration and to maximise after-learning performance. We introduce a novel algorithm that
extracts useful information from expert demonstrations (traces of interactions with the target environment) and uses it to improve performance. The algorithm detects unexpected decisions made
by the expert and infers what goal the expert was pursuing. Goals are then used to bias decisions while learning. Our experiments in the video game Pac-Man provide statistically significant evidence
that our method can improve final performance compared to a
state-of-the-art approach.
Funding
Enhancing the Australian theme park experience by harnessing virtual-physical play