Computer Science Graduate Seminar: Maximum Entropy Models for Sequences: Scaling up from Tagging to Translation
Wednesday, May 17, 2017, 2:00pm
Location: Computer Science Center, Room 5052, Ahornstr. 55
Speaker: Dipl.-Phys. Patrick Lehnen
Maximum entropy approaches for sequences tagging and conditional random fields in particular have shown high potential in a variety of tasks. The effectiveness of these approaches are verified within this thesis using semantic tagging within natural language understanding as an example. Within this task, decent feature engineering and a tuning of the regularization parameter is sufficient to let conditional random fields be superior to a broad set of competing approaches including support vector machines, phrase-based translation, maximum entropy Markov models, dynamic Bayesian networks, and probabilistic finite state transducers. Applying conditional random fields to other tasks in many cases calls for extensions to the original notation. For a multi-level semantic tagging in natural language understanding, constrained search is needed, whereas for grapheme- to-phoneme conversion, the support for a hidden segmentation and huge feature sets is required, and for statistical machine translation a solution for the large input and output vocabulary, even larger feature sets, and the hidden alignments have to be found. This thesis presents solutions to all these constraints. The conditional random fields are modeled with finite state transducers to support constraints on the search space. They are extended with hidden segmentation, elastic net regularization, sparse-forward-backward, pruning in training, and intermediate classes in the output layer. Finally, we will add up all extensions to support statistical machine translation with conditional random fields. The best implementation for statistical machine translation is then based on a refined maximum expected Bleu objective using a similar feature notation and the same RPROP parameter estimation but use the phrase-based or hierarchical baseline more efficiently with the help of n-best lists.
The computer science lecturers invite interested people to join.