Computer Science Graduate Seminar

Wednesday, March 15, 2023, 2:00pm

Neural Hidden Markov Model for Machine Translation

  • Weiyue Wang, M.Sc.
  • Zoom: Zoom

      Meeting ID: 937 4910 4715
      Password: 743829


Recently, neural machine translation systems have shown promising performance. One of the key components that almost all modern neural machine translation systems contain is the attention mechanism, which helps an encoder-decoder model attend to specific positions on the source side to produce a translation. However, recent studies have found that using attention weights straight out of the box to align words results in poor alignment quality. This inspires us to introduce an explicit alignment model into the neural architecture in order to improve the alignment and thus also the translation quality of the overall system. To this end, we propose a novel neural hidden Markov model consisting of neural network-based lexicon and alignment models trained jointly with the forward-backward algorithm.

Various neural network architectures are used to model the lexicon and the alignment probabilities. We start with feedforward neural networks and apply our first model to re-rank n-best lists generated by phrase-based systems and observe significant improvements. In order to build a monolithic neural hidden Markov model, the more powerful recurrent neural networks are applied to the architecture, and a standalone decoder is implemented. By replacing the attention mechanism with an alignment model, we achieve comparable performance to the baseline attention model while significantly improving the alignment quality. We also apply the state-of-the-art transformer architecture to the neural hidden Markov model and the experimental results show that the transformer-based hidden Markov model outperforms the standard self-attentive transformer model in terms of TER scores.

In addition to the work on the neural hidden Markov model, we propose two novel metrics for machine translation evaluation, called CHARACTER and EED. These are easy-to-use and perform promisingly in the annual WMT metrics shared tasks.


Es laden ein: die Dozentinnen und Dozenten der Informatik