Multi-object tracking and person analysis from mobile robot platforms

Breuers, Stefan; Leibe, Bastian (Thesis advisor); Groß, Horst-Michael (Thesis advisor)

Düren : Shaker Verlag (2020)
Book, Dissertation / PhD Thesis

In: Selected topics in computer vision 4
Page(s)/Article-Nr.: 1 Online-Ressource (viii, 130 Seiten) : Illustrationen, Diagramme

Dissertation, RWTH Aachen University, 2019


Multi-object tracking is a broad and very active field of research in the area of computer vision. Finding the trajectories of multiple persons in a scene is an important key component in video analysis, surveillance, autonomous driving, as well as mobile robotics. The latter application has led to several international research projects, e.g., developing social service platforms, on whose results this thesis is based on. First, we study common approaches for image-based 2D multi-object tracking and analyze exemplary methods with regard to the errors they make. We propose a classifier that learns the situations where false positive tracks appear, based on bounding box context features. The individual characteristics allow for a combination of the trackers’ output and we show that this leads to an improved general result. This not only indicates that there is still potential to improve individual methods, but also that multi-object trackers have different strengths and we always need to take a full look on all the evaluation measures. When analyzing the results of those trackers it is therefore important to keep the application scenario in mind. As mentioned above, we have a look at robot platforms and examine how well recent multi-object tracking approaches perform in those 3D world situations. For this, we present a highly modular detection-tracking pipeline. We discuss important design choices, considering the chosen data association or the use of multi-modal detectors, where complex methods or more input, respectively, does not always lead to better tracking performance. We then extend the above pipeline to also integrate person analysis modules as another modular level. By using the unique trajectories, we can apply temporal filtering on the analysis output of each tracked person. On the example of head and body pose estimation, we show that this way, we get a smoothed, improved result of those attributes. Additionally, it is possible to run those filters with a certain stride, resulting in a huge performance boost when dealing with those expensive deep learning methods. Finally, we also explore a new multi-object tracking approach building on top of this successful deep learning framework. While existing methods often use deep appearance or motion models to help the data association step, we try to completely sidestep the dependency on a detector and therefore the need for data association. In order to do so, we make use of a strong re-identification model based on triplet loss inside an optimal Bayes filter, which forms the theoretical foundation of many tracking methods. By modeling track states as full probability maps, we can operate directly on the image input, taking a step towards an end-to-end image-to-track approach.