Donnerstag, 24.11.2022, 10.00 Uhr

Neural Sequence-to-Sequence Modeling for Language and Speech Translation

      Meeting ID: 936 1698 3610
      Kenncode: 382088



In recent years, various fields in human language technology have been advanced by the success of neural sequence-to-sequence modeling. The application of attention models to automatic speech recognition, text, and speech machine translation has become dominant and well-established. Although the effectiveness of such models has been documented in scientific papers, not all aspects of attention sequence-to-sequence models have been explored. Therefore, the main contribution of this thesis centers around redesigning attention models by proposing novel alternative architectures. From a modeling perspective, this research goes beyond current sequence-to-sequence backbone models to directly incorporate input and output sequences in a two-dimensional structure where an attention mechanism is no longer required. This model distinguishes itself from attention models in which inputs and outputs are treated as one-dimensional sequences over time. Current state-of-the-art attention models also lack an explicit alignment, a core component of traditional systems. Such a gross simplification of a complex process complicates the extraction of alignments between input and output positions. To enable the explainability of attention models and more controllable output, the next part of this study integrates the attention model into the hidden Markov model formulation by introducing alignments as a sequence of hidden variables. Finally, an exciting research direction is combining speech recognition with text machine translation for speech-to-text translation. Besides advancing a cascade of independently trained speech recognition and machine translation systems, this thesis sheds light on different end-to-end models to directly translate speech into a target text and shows that such end-to-end models can practically translate speech utterances as a substitute solution to cascaded speech translation.


Es laden ein: die Dozentinnen und Dozenten der Informatik