Referent: Dipl.-Inform. Tobias Weyand
With their rapid growth in recent years, Internet photo collections have become an in valuable repository of visual data. In particular, they provide detailed coverage of the world’s landmark buildings, monuments, sculptures, and paintings. This wealth of visual information can be used to construct landmark recognition engines that can automatically tag a photo of a landmark with its name and location. Landmark recognition engines rely on clustering algorithms that are able to group several millions of images by the buildings or objects they depict. This grouping problem is very challenging since the massive amount of Internet images requires efficient and highly parallel algorithms, and the appearance variability of buildings caused by viewpoint, weather and lighting changes requires robust image similarity measures. Most importantly, it is critical to define a clustering criterion that results in meaningful object clusters. The Iconoid Shift algorithm we present in this thesis uses a very intuitive definition: It represents each object by an iconic image, or iconoid, and defines its cluster as the set of all images that have a certain minimum overlap with the iconoid. The iconoid of an object is the image that has the most overlap with all other images of the object. We find Iconoids by performing mode search using a novel distance measure based on image overlap that is more robust to viewpoint and lighting changes than traditional image distance measures. We propose efficient parallel algorithms for performing this mode search. In contrast to most previous algorithms that produced a hard clustering, Iconoid Shift produces an overlapping clustering and thus elegantly handles images showing multiple nearby landmarks by assigning them to multiple clusters. The increasing density of Internet photo collections allows us to go a step further and to even discover sub-structures of buildings such as doors, spires, or facade details. To this end, we present the Hierarchical Iconoid Shift algorithm that, instead of a flat clustering, produces a hierarchy of clusters, where each cluster represents a building sub-structure. This algorithm is based on a novel hierarchical variant of Medoid Shift that tracks the evolution of modes through scale space by continuously increasing the size of its kernel window. But which objects can a landmark recognition engine built by automatically mining Internet photo collections recognize? And how to construct such a system such that it is efficient and achieves high recognition performance? To answer these questions, we perform a large-scale evaluation of the different components of a landmark recognition system, analyzing how different choices of components and parameters affect performance for different object categories such as buildings, paintings or sculptures.
Es laden ein: Die Dozenten der Informatik
Unser Gast, Melanie Götze, Systemische Beraterin und Freiberuflerin in Düsseldorf, verrät erfolgreiche Tipps und pfiffige Tricks:
Alle Doktorandinnen und Doktoranden sind zu dieser Veranstaltung herzlich eingeladen!
Referent: Mahaboob Ali Basha Shaik
By definition, words that are not present in a recognition vocabulary are called out-of-vocabulary (OOV) words. Recognition of unseen or new words is an important feature that is always desired in any real-world large vocabulary continuous speech recognition (LVCSR) system. However, human languages are complex in nature due to wide varieties of morphological richness such as inflections, derivations and compounding. For instance, language models for morphologically rich languages like German, Polish, Slovene, etc, often have high OOV rates, data sparsity and rather poor generalization of unseen sequences. In spite of the substantial amount of work that has been carried out to recognize unseen words in recent decades, many issues related to open vocabulary problem still exist, especially, under large vocabulary conditions. This dissertation addresses some of the core issues and makes an attempt to solve them by investigating and introducing different types of hybrid and hierarchical language models, supported by detailed experimental analysis. Careful selection of sub-word unit is necessary in a hybrid language model, as it has a large impact on OOV rate, data sparsity and recognition issues. Different types of sub-word unit, such as morphemes, syllables and graphones are investigated on selected morphologically rich languages. The traditional hybrid approach uses only sub-words, which is not robust in-terms of reducing word error rates on large vocabulary tasks. This work investigates different types of count-based hybrid language models. One method is to use an optimal number of full words and sub-words. Further extensions include the use of an optimal number of full words, sub-words and sub-word graphones based on word frequencies. The advantage of using two or three different types of units in a hybrid language model is that it helps improve recognition of OOVs and also compensates for weaker contexts, and reduces data sparsity to some extent. In addition, this work also investigates maximum entropy and long short-term memory network hybrid language models. A maximum entropy approach is combined within class-based language modelling framework. Additionally, novel extensions are proposed in the hierarchical language modelling approach, where a full word language model and a character level language model are directly used during decoding in a hierarchical manner to recognize in-vocabulary and OOV words, respectively, for LVCSR tasks. Sequence normalization using a prefix tree approach is applied to hierarchical language models. Variants of the hierarchical approach are introduced by incorporating weighted and non-weighted character language models, multi class character language models, and grapheme to phoneme models. These types of language model guarantee zero OOV rate. Alternatively, a properly normalized combined interpolated language model is introduced that also uses a full word language model and a character level language model during decoding, exploiting within word context or across word context at a character level for OOV recognition.
Es laden ein: Die Dozenten der Informatik