Freitag, 04.12.2020, 15.00 Uhr

Alignment and Localization in Fine-Grained Image Recognition

  • Zoom:
  • Referent: Harald Hanselmann, M.Sc.



Image recognition tasks can be classified into different categories with respect to the extent of the inter-class variations. General image recognition tasks typically classify images into a wide variety of broad categories and therefore display large inter-class variation. Fine-grained image classifications tasks, however, are defined by low inter-class variation. Examples of such tasks include the classification of animal species, car models or face recognition.

For fine-grained tasks, it is not only important to detect which features are in an image, but also where they are located and what their spatial relations are. In this thesis we look at different methods to align and localize features and discriminative regions for fine-grained image classification. On the one hand, we will look at computing dense pixel-wise alignments using 2D-Warping. In this context, we will introduce methods for speeding up the computation of the dense alignments as the runtime is the main drawback of 2D-Warping based approaches. Additionally, we will introduce a new 2D-Warping algorithm that obtains better results in terms of optimization score and classification accuracy compared to previous 2D-Warping algorithms. On the other hand, we will explore a new method to obtain local features needed to compute the dense alignments. These features are learned from data using convolutional neural networks (CNNs)

Further, we introduce a warped region-of-interest pooling layer based on 2D-Warping that can be inserted into a CNN. We observe that for good classification accuracy, modeling translation and scaling are most important. For this reason we introduce a stand-alone localization module that handles translation and scaling variances, is very lightweight and efficient, and needs only class labels to be trained. We then add an embedding layer and global K-max pooling to obtain a complete and efficient system for fine-grained image classification. Finally, to simplify the training procedure and leverage the benefits of full end-to-end systems, we transform the localization module such that it can be integrated into the classification model and trained jointly. We evaluate our methods on popular and challenging tasks for fine-grained image classification and are able to report very competitive results.


Es laden ein: die Dozentinnen und Dozenten der Informatik