AS Seminar: Data driven Fully probabilistic design

Date:

2022-12-05 11:00

Room:

474

Lecturer:

Dr. Siavash Fakhimi Derakhshan, Ph.D.

Department:

Department of Adaptive Systems

In imitation learning (IL) an agent learns optimal policy from expert demonstrations. Among the classical approaches used for solving IL there are behavioral cloning that learns a policy via a supervised learning, and inverse reinforcement learning (IRL). Applying the fully probabilistic design (FPD) formalism, we propose a new general approach for ﬁnding a stochastic policy from demonstrations. The approach infers a policy directly from data without interaction with the expert or using any reinforcement signal. The expert’s actions generally need not to be optimal. The proposed approach learns an optimal policy by minimising Kullback-Liebler divergence between probabilistic description of the actual agent-environment behaviour and distribution describing a targeted behaviour. We demonstrate our approach on simulated examples and show that the learned policy: i) converges to the optimal policy obtained by FPD; ii) achieves better performance than optimal FPD policy whenever a mis-modelling is present.

2022-12-02 10:49

You are here

About us

ÚTIA life

Intranet

Attendance