Target classification in multimodal video
MetadataShow full item record
The presented thesis focuses on enhancing scene segmentation and target recognition methodologies via the mobilisation of contextual information. The algorithms developed to achieve this goal utilise multi-modal sensor information collected across varying scenarios, from controlled indoor sequences to challenging rural locations. Sensors are chieﬂy colour band and long wave infrared (LWIR), enabling persistent surveillance capabilities across all environments. In the drive to develop eﬀectual algorithms towards the outlined goals, key obstacles are identiﬁed and examined: the recovery of background scene structure from foreground object ’clutter’, employing contextual foreground knowledge to circumvent training a classiﬁer when labeled data is not readily available, creating a labeled LWIR dataset to train a convolutional neural network (CNN) based object classiﬁer and the viability of spatial context to address long range target classiﬁcation when big data solutions are not enough. For an environment displaying frequent foreground clutter, such as a busy train station, we propose an algorithm exploiting foreground object presence to segment underlying scene structure that is not often visible. If such a location is outdoors and surveyed by an infra-red (IR) and visible band camera set-up, scene context and contextual knowledge transfer allows reasonable class predictions for thermal signatures within the scene to be determined. Furthermore, a labeled LWIR image corpus is created to train an infrared object classiﬁer, using a CNN approach. The trained network demonstrates eﬀective classiﬁcation accuracy of 95% over 6 object classes. However, performance is not sustainable for IR targets acquired at long range due to low signal quality and classiﬁcation accuracy drops. This is addressed by mobilising spatial context to aﬀect network class scores, restoring robust classiﬁcation capability.