Search and training with joint translation and reordering models for statistical machine translation
- Suche und Training mit kombinierten Übersetzungs- und Umordnungsmodellen für die statistische maschinelle Übersetzung
Guta, Vlad Andreas; Ney, Hermann (Thesis advisor); Fraser, Alexander M. (Thesis advisor)
Dissertation / PhD Thesis
Dissertation, RWTH Aachen University, 2020
Statistical machine translation describes the task of automatically translating a written text from a natural language into another. This is done by means of statistical models, which implies defining suitable models, searching the most likely translation of the given text using them and training their parameters on given bilingual sentence pairs. Phrase-based machine translation emerged two decades ago—and it became the state of the art throughout the following years. Nevertheless, the breakthrough of neural machine translation in 2014 triggered an abrupt conversion towards neural models. A fundamental drawback of the traditional approach is the phrases themselves. They are extracted from word-aligned bilingual data via hand-crafted heuristics. The phrase translation models are estimated using the extraction counts resulting from the applied phrase extraction heuristics. Moreover, the translation models exclude any phrase-external information, which in turn limits the context used to generate a target word during search. To complement the restricted models, a variety of additional models and heuristics are used. However, the potentially largest downside is that the word alignments required for the phrase extraction are trained with IBM and hidden Markov models. This results in a discrepancy between the models applied in training and those that are actually used in search. Although the neural approach clearly outperforms the phrasal one, it remains to be answered whether it is the complexity of neural models, which capture dependencies between whole source sentences and their translations, or the coherent application of the same models in both, training and decoding that leads to the superior performance of neural machine translation. We aim at answering this research question by developing a coherent modelling pipeline that improves over the phrasal approach by relying on fewer but stronger models, discarding dependencies on phrasal heuristics and applying the same word-level models in training and search. First, we investigate two different types of word-based translation models: extended translation models and joint translation and reordering models. Both are enhanced with extended context information and estimate lexical and reordering probabilities. They are integrated directly into the phrase-based search and evaluated against state-of-the-art phrasal baselines to investigate their benefit on top of phrasal models. In the second part, we develop a novel beam-search decoder that generates the translation word-wise, thus discarding any dependencies on heuristic phrases, and incorporates a joint translation and reordering model. It includes far less features than its phrasal systems and its performance is analyzed in comparison to the above-mentioned phrasal baseline systems. The final goal is to achieve a sound and coherent end-to-end machine translation framework. For this purpose, we apply the same models and search algorithm that are employed in word-based translation also in training. To this end, we develop an algorithm for optimizing word alignments and model parameters alternatingly, which is performed iteratively with an increasing model complexity.