Title:
Classification and temporal segmentation for human interaction videos using independent subspace analysis features
Speaker:
Ngoc Nguyen (JAIST)
Abstract:
Human interaction recognition has received much interest in computer vision community because it has great scientific importance and many potential practical applications such as surveillance, and automatic video indexing. Previous approaches rely on either spatio-temporal local features (i.e. SIFT) or human poses, or human joints to represent human interactions. In this paper, motivated by the recent success of deep learning networks, we introduce a three-layer convolutional network which employs the Independent Subspace Analysis (ISA) algorithm to automatically learn hierarchical invariant features from videos. A pooling layer is also presented to construct effective local features from the learned features. Furthermore, we design a system to classify and temporally segment human interaction videos. We evaluate the performance of our system and the effectiveness of local features on video sequences of the UT-Interaction dataset, which contain both interacting persons and irrelevant pedestrians in the scenes. The UT-Interaction dataset imposes several challenging factors including moving backgrounds, clutter scenes, scales and camera jitters. We also test our system on the Hollywood2 dataset, which has large intra-class variability, multiple persons, rapid scene changes, and unconstrained background. The encouraging results on the UT-Interaction and Hollywood2 dataset show that our system is capable of recognizing complex activities, especially human interactions in realistic environment.