Institute of Information Theory and Automation

You are here

AS Seminar: Data driven Fully probabilistic design

2022-12-05 11:00

In imitation learning (IL) an agent learns optimal policy from expert demonstrations. Among the classical approaches used for solving IL there are behavioral cloning that learns a policy via a supervised learning, and inverse reinforcement learning (IRL). Applying the fully probabilistic design (FPD) formalism, we propose a new general approach for finding a stochastic policy from demonstrations. The approach infers a policy directly from data without interaction with the expert or using any reinforcement signal. The expert’s actions generally need not to be optimal. The proposed approach learns an optimal policy by minimising Kullback-Liebler divergence between probabilistic description of the actual agent-environment behaviour and distribution describing a targeted behaviour. We demonstrate our approach on simulated examples and show that the learned policy: i) converges to the optimal policy obtained by FPD; ii) achieves better performance than optimal FPD policy whenever a mis-modelling is present.

2022-12-02 10:49