Přejít k hlavnímu obsahu

AS Seminar: Data driven Fully probabilistic design

Date of event

The seminar will be held on Monday 5.12.2022 at 11:00 in the AS-meeting room 474, the speaker will be Siavash Fakhimi Derakhshan.

In imitation learning (IL) an agent learns optimal policy from expert demonstrations. Among the classical approaches used for solving IL there are behavioral cloning that learns a policy via a supervised learning, and inverse reinforcement learning (IRL). Applying the fully probabilistic design (FPD) formalism, we propose a new general approach for finding a stochastic policy from demonstrations. The approach infers a policy directly from data without interaction with the expert or using any reinforcement signal. The expert’s actions generally need not to be optimal. The proposed approach learns an optimal policy by minimising Kullback-Liebler divergence between probabilistic description of the actual agent-environment behaviour and distribution describing a targeted behaviour. We demonstrate our approach on simulated examples and show that the learned policy: i) converges to the optimal policy obtained by FPD; ii) achieves better performance than optimal FPD policy whenever a mis-modelling is present.


Napsal uživatel neuner dne