Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can belearned fully end-to-end as opposed to being specified in advance. While theproblem of “how” to learn options is increasingly well understood, the questionof “what” good options should be has remained elusive. We formulate our answerto what “good” options should be in the bounded rationality framework (Simon,1957) through the notion of deliberation cost. We then derive practicalgradient-based learning algorithms to implement this objective. Our results inthe Arcade Learning Environment (ALE) show increased performance andinterpretability.