Human Action Recognition in Videos

2015-05-20 14:00
Piotr Tadeusz Bilinski
(INRIA, France)
This talk targets the automatic recognition of human actions in videos. Human action recognition is defined as a requirement to determine what human actions occur in videos. This problem is particularly hard due to enormous variations in visual and motion appearance of people and actions, camera viewpoint changes, moving background, occlusions, noise, and enormous amount of video data. Firstly, I will present two local spatio-temporal descriptors for action recognition in videos. The first descriptor is based on a covariance matrix representation, and it models linear relations between low-level features. The second descriptor is based on a Brownian covariance and it models all kinds of possible relations between low-level features. Then, I will talk about two higher-level feature representations to go beyond the limitations of the local feature encoding techniques. The first representation is based on the idea of relative dense trajectories. I will present an object-centric local feature representation of motion trajectories, which allows to use the spatial information by a local feature encoding technique. The second representation captures statistics of pairwise co-occurring visual words within multi-scale feature-centric neighborhoods. The proposed contextual features based representation encodes information about local density of features, local pairwise relations among the features, and spatio-temporal order among features. Finally, I will show that the proposed techniques obtain better or similar performance in comparison to the state-of-the-art on various, real, and challenging human action recognition datasets (Weizmann, KTH, URADL, MSR Daily Activity 3D, UCF50, HMDB51, and CHU Nice Hospital).
