Informatik-Oberseminar
Freitag, 14.08.2020, 14.00 Uhr
Data-Driven Deep Modeling and Training for Automatic Speech Recognition
- Ort: https://us02web.zoom.us/j/83272559800?pwd=Nk5yU1c3anRZeE9yYU5GMU0yaHQ3Zz09
- Referent: Diplom-Informatiker Pavel Golik
Abstract
Many of today's state-of-the-art automatic speech recognition (ASR) systems are based on hybrid hidden Markov models (HMM) that rely on neural networks to provide acoustic and language model probabilities. The training of the acoustic model will be the main focus of this thesis.
In the first part of this thesis we will be concerned with the question, to which extent can the extraction of acoustic features be learned by the acoustic model. We will show that not only can a neural network learn to classify the HMM states from the raw time signal, but also learn to perform the time-frequency decomposition in its input layer. Inspired by this finding, we will replace the fully-connected input layer by a convolutional layer and demonstrate that such models show competitive performance on real data.
In the second part we will investigate the objective function that is optimized during the supervised acoustic training. In principle, both cross entropy and squared error can be used in frame-wise training. We will compare the objective functions and demonstrate that it is possible to train a hybrid acoustic model using squared error criterion.
In the third part of this study we will investigate how i-vectors can be used for acoustic adaptation. We will show that i-vectors can help to obtain a consistent reduction of word error rate on multiple tasks and perform a careful analysis of different integration strategies.
In the fourth and final part of this thesis we will apply these and other methods to the task of speech recognition and keyword search on low-resource languages. The limited amount of available resources makes the acoustic training extremely challenging. We will present a series of experiments performed in the scope of the IARPA Babel project that make heavy use of multilingual bottleneck features.
Es laden ein: die Dozentinnen und Dozenten der Informatik