Multimodal headpose estimation and applications
Mukherjee, Sankha Subhra
MetadataShow full item record
This thesis presents new research into human headpose estimation and its applications in multi-modal data. We develop new methods for head pose estimation spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild" surveillance quality data. We present the state-of-the-art solution in both head detection and head pose estimation through a new end-to-end Convolutional Neural Network architecture that reuses all of the computation for detection and pose estimation. In contrast to prior work, our method successfully spans close up HCI to low-resolution surveillance data and is cross modality: operating on both RGB and RGB-D data. We further address the problem of limited amount of standard data, and different quality of annotations by semi supervised learning and novel data augmentation. (This latter contribution also finds application in the domain of life sciences.) We report the highest accuracy by a large margin: 60% improvement; and demonstrate leading performance on multiple standardized datasets. In HCI we reduce the angular error by 40% relative to the previous reported literature. Furthermore, by defining a probabilistic spatial gaze model from the head pose we show application in human-human, human-scene interaction understanding. We present the state-of-the art results on the standard interaction datasets. A new metric to model "social mimicry" through the temporal correlation of the headpose signal is contributed and shown to be valid qualitatively and intuitively. As an application in surveillance, it is shown that with the robust headpose signal as a prior, state-of-the-art results in tracking under occlusion using a Kalman filter can be achieved. This model is named the Intentional Tracker and it improves visual tracking metrics by up to 15%. We also apply the ALICE loss that was developed for the end-to-end detection and classification, to dense classiffication of underwater coral reefs imagery. The objective of this work is to solve the challenging task of recognizing and segmenting underwater coral imagery in the wild with sparse point-based ground truth labelling. To achieve this, we propose an integrated Fully Convolutional Neural Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major areas. First, we show that multi-scale crop based training is useful in learning of the initial weights in the canonical one class classiffication problem. Second, we propose a modified ALICE loss for training the FCNN on sparse labels with class imbalance and establish its signi cance empirically. Third we show that by arti cially enhancing the point labels to small regions based on class distance transform, we can improve the classification accuracy further. Fourth, we improve the segmentation results using fully connected CRFs by using a bilateral message passing prior. We improve upon state-of-the-art results on all publicly available datasets by a significant margin.