Reinforcement learning (RL) is about training agents to complete tasks. We typically think of this as being able to accomplish some goal. Take, for example, a robot we might want to train to open a door. Reinforcement learning can be used as a framework for teaching the robot to open the door by allowing it to learn from trial and error. But what if we are interested in having our agent solve not just one goal, but a set that might vary over time?
In this article, and the accompanying notebook available on GitHub, I am going to introduce and walk through both the traditional reinforcement learning paradigm in machine learning as well as a new and emerging paradigm for extending reinforcement learning to allow for complex goals that vary over time.
I will start by demonstrating how to build a simple Q-learning agent that is guided by a single reward signal to navigate an environment and make deliveries. I will then demonstrate how this simple formulation becomes problematic for more complex behavior we might envision. To allow for greater flexibility, I will then describe how to build a class of reinforcement learning agents, which can optimize for various goals called “direct feature prediction” (DFP). All the code is available in TensorFlow in this accompanying iPython Jupyter Notebook.