Mittwoch, 22.07.2020, 11.00 Uhr

An Exploration of Alignment Concepts to Bridge the Gap between Phrase-based and Neural Machine Translation

  • Ort:
  • Referent: Diplom-Informatiker Jan Thorsten Peter



Machine translation, the task of automatically translating text from one natural language into another, has seen massive changes in recent years. After phrase-based systems represented state of the art for over a decade, made advancements in the structure of neural networks and computational power, it possible to build neural machine translation systems, which first improved and later outperformed phrase-based systems. These two approaches have their strength in different areas, the well known phrase-based systems allow fast translations on CPU that can be easily explained by examining the translation table. In contrast, neural machine translation produces more fluent translations and is more robust to small changes in the provided input. This thesis aims to improve both systems by combining their advantages.

The first part of this thesis focuses on investigating the integration of feed-forward neural models into phrase-based systems. Small changes in the input of a phrase-based system can turn an event that was seen in the training data into an unseen event. Neural network models are by design able to handle such cases due to the continuous space representation of the input, whereas, phrase-based systems are forced to fall back to shorter phrases. This means a loss of knowledge about the local context which results in a degradation of, the translation quality. We combine the flexibility provided by feed-forward neural networks with phrase-based systems while gaining a significant improvement over the phrase-based baseline systems. We use feed-forward networks since they are conceptually simple and computationally fast. Commonly, their structure only utilizes local source and target context. Due to this structure, they cannot capture long-distance dependencies. We improve the performance of feed-forward neural networks by efficiently incorporating long-distance dependencies into their structure by using a bag-of-words input.

The second part of the thesis focuses on the pure neural machine translation approach using the encoder-decoder model with an attention mechanism. This mechanism corresponds indirectly to a soft alignment. At each translation step, this model relies only on its previous internal state and the current decoder position to compute the attention weights. There is no direct feedback from the previously used attention. Inspired by hidden Markov models, where the prediction of the currently aligned position depends on the previously aligned position, we improve the attention model by adding direct feedback from previously used attention to improve the overall model performance. Additionally, we utilize word alignments for neural networks is by to guide the neural network during training. By incorporating the alignment as an additional cost function the network performs better as our experiments show. Even though the state-of-the-art neural models, do not require word alignments anymore, there are still applications that benefit from good alignments, including the visualization of parallel sentences, the creation of dictionaries, the automatic segmentation of long parallel sentences and the above-mentioned usage during neural network training. We present a way to apply neural models to create word alignments that improve over word alignments trained with IBM and hidden Markov models.

We evaluate these techniques on various large-scale translation tasks of public evaluation campaigns. Applying new methods with usually complex workflows to new translation tasks is a cumbersome and error-prone exercise. We present a workflow manager, which is developed as part of this thesis, to simplify this task and enable an easier knowledge transfer.


Es laden ein: die Dozentinnen und Dozenten der Informatik