Donnerstag, 08.10.2020, 14.00 Uhr

Alignment Models for Recurrent Neural Networks

  • Zoom:
  • Referent: Diplom-Informatiker Patrick Doetsch



Over the last decade a new standard for modeling automatic speech recognition systems (ASR) and handwriting recognition systems (HWR) has been established by combining hidden Markov models (HMM) with recurrent neural network (RNN) observation models. While earlier approaches with feed-forward neural networks require a fine-graded time-synchronous alignment between the input data and the output transcription, RNNs are capable of modeling the sequential nature of the speech signal or text line image directly. The aim of this thesis is to investigate how these sequential modeling properties affect the training of ASR and HWR observation models on large-scale corpora.

In the first part of the thesis we investigate the training procedure of several RNN topologies. We hereby focus on variants of the long short-term memory (LSTM) and measure their performance on different corpora. For this purpose we introduce a software package for large-scale RNN training, which was developed as part of this thesis. Different methods to improve training performance are discussed and we demonstrate their effectiveness on several large tasks.

In the second part of this thesis we study the effects of the temporal modeling capabilities of RNNs on the time-synchronous alignment approach, which has been used in combination with HMMs over the last decade. Our focus here are variants of the connectionist temporal classification (CTC) HMM topology. Based on the insights gained from this study, we investigate label-synchronous alignment approaches for HWR and ASR. These alignment methods do not rely on time alignments, but generate the output transcription label-by-label while taking specific parts of the input signal into account. First, we describe an encoder-decoder system with an attention mechanism for HWR. We then combine this idea with the classical approach by deriving so-called inverted alignments, which allow to formalize label-synchronous alignments in the context of HMMs. We evaluate our novel approach in different experimental settings and present results on a large ASR corpus.


Es laden ein: die Dozentinnen und Dozenten der Informatik