August 25, 2022




Reinforcement learning (RL) aims to provide a framework for finding the optimal behavior (i.e., an optimal policy) of intelligent agents regarding some environment they interact with by directing them via a reward signal, measuring their performance. With the help of inverse reinforcement learning (IRL) we can try to improve our agents by recovering the reward function (and therefore policy) of an expert, in essence using its domain knowledge for our needs.

Maximum entropy IRL is a comparatively simple but clever method of solving the general IRL problem for discrete Markov decision processes. In this blog post, I will lay out the theoretical foundation of this approach, including the principle of maximum entropy, and derive the maximum entropy IRL algorithm.