|dc.description.abstract||Visual tracking is a key task in applications such as intelligent surveillance, humancomputer interaction (HCI), human-robot interaction (HRI), augmented reality (AR), driver assistance systems, and medical applications. In this thesis, we make three main novel contributions for target tracking in video sequences.
First, we develop a long-term model-free single target tracking by learning discriminative correlation ﬁlters and an online classiﬁer that can track a target of interest in both sparse and crowded scenes. In this case, we learn two diﬀerent correlation ﬁlters, translation and scale correlation ﬁlters, using diﬀerent visual features. We also include a re-detection module that can re-initialize the tracker in case of tracking failures due to long-term occlusions.
Second, a multiple target, multiple type ﬁltering algorithm is developed using Random Finite Set (RFS) theory. In particular, we extend the standard Probability Hypothesis Density (PHD) ﬁlter for multiple type of targets, each with distinct detection properties, to develop multiple target, multiple type ﬁltering, N-type PHD ﬁlter, where N ≥ 2, for handling confusions that can occur among target types at the measurements level. This method takes into account not only background false positives (clutter), but also confusions between target detections, which are in general diﬀerent in character from background clutter. Then, under the assumptions of Gaussianity and linearity, we extend Gaussian mixture (GM) implementation of the standard PHD ﬁlter for the proposed N-type PHD ﬁlter termed as N-type GM-PHD ﬁlter.
Third, we apply this N-type GM-PHD ﬁlter to real video sequences by integrating object detectors’ information into this ﬁlter for two scenarios. In the ﬁrst scenario, a tri-GM-PHD ﬁlter is applied to real video sequences containing three types of multiple targets in the same scene, two football teams and a referee, using separate but confused detections. In the second scenario, we use a dual GM-PHD ﬁlter for tracking pedestrians and vehicles in the same scene handling their detectors’ confusions. For both cases, Munkres’s variant of the Hungarian assignment algorithm is used to associate tracked target identities between frames.
We make extensive evaluations of these developed algorithms and ﬁnd out that our methods outperform their corresponding state-of-the-art approaches by a large margin.||en_US