Deep learning-based person re-identiﬁcation
MetadataShow full item record
In this dissertation I address the problem of person re-identiﬁcation for wide-area surveillance applications, in the challenging scenario of an unconstrained network of uncalibrated and non-overlapping cameras. Speciﬁcally, of all the sources of uncertainty affecting the pedestrians appearance, I focus on tackling the changing viewpoint factor across cameras. This variability factor causes the representations of diﬀerent identities to interfere with each other in the feature space, to the detriment of the discrimination capability of the re-identiﬁcation system. In order to deal with this problem, I propose two eﬀective methods, relying on the representational power of deep architectures, that consist respectively in a newly designed embedding learning technique and a pose-aware regulation approach for video-sequences enabled by a generative model. My ﬁrst method addresses the viewpoint problem in the context of the still images, aiming to make the most out of a ﬁxed available amount of training data, without relying on the exploitation of any side information. I present a novel training loss for convolutional neural networks that achieves better optimization by learning a convenient embedding space. This method targets two aspects: expanding the feature space and contextually reducing the intra-class variability for all identities, without increasing the training complexity or requiring the support of samples mining techniques. I illustrate in a demo the beneﬁcial eﬀects of this method combined with the deﬁnition of an ad-hoc novelty threshold, for open-set re-id application. In my second method, moving towards the requirements of real-world applications, I extend my investigation on the viewpoint factor to the video-context, where samples are represented by tracklets. I apply for the ﬁrst time a GAN-based generative model to video-sequences for complementing and pose-aligning the original incomplete data. To address this, I proceed in two steps. Firstly, I perform tracklets normalization with respect to a set of canonical poses that integrate the missing pose/viewpoint information by synthetic GAN-generated images. A weighted fusion scheme combines then the generated information with the original data representation. Secondly, I perform explicit pose-based alignment of sequence pairs to promote coherent feature matching, mitigating the negative eﬀect of low inter-identities relative distance. Both my approaches compare positively to the state of the art and show signiﬁcant improvement over other competing techniques on several popular public datasets.