Dienstag, 29.09.2020, 11.00 Uhr
Towards Large Vocabulary Continuous Sign Language Recognition: From Artificial to Real-Life Tasks
- Zoom: https://us02web.zoom.us/j/86007461999?pwd=VTgvOUdqcW0yeVQvdU5pYmlUNHROdz09
- Referent: Dipl.-Ing. Oscar Koller
This thesis deals with large vocabulary continuous sign language recognition.
Historically, research on sign language recognition has been dispersed and often researchers independently captured their own small-scale data sets for experimentation. Most available data sets do not cover the complexity that sign languages encompass. Moreover, most previous work does not tackle continuous sign language but only isolated single signs. Besides containing only a very limited vocabulary, no work has ever targeted real-life sign language. The employed data sets typically comprised artificial and staged sign language footage, which was planned and recorded with the aim of enabling automatic recognition. The kind of signs to be encountered, the structure of sentences, the signing speed, the choice of expression and dialects were usually controlled and determined beforehand.
This work aims at moving sign language recognition to more realistic scenarios. For this purpose we created the first real-life large vocabulary continuous sign language corpora, which are based on recordings of the broadcast channel featuring natural sign language of professional interpreters. This kind of data provides unprecedented complexity for recognition. A statistical sign language recognition system based on Gaussian mixture and hidden Markov models (HMMs) with hand-crafted features is created and evaluated on the challenging task. We then leverage advances in deep learning and propose modern hybrid convolutional neural network (CNN) and long short-term memory (LSTM) HMMs which are shown to halve the recognition error. Finally, we develop a weakly supervised learning scheme based on hybrid multi-stream CNN-LSTM-HMMs that allows the accurate discovery of sign subunits such as articulated handshapes and mouth patterns in sign language footage.
Es laden ein: die Dozentinnen und Dozenten der Informatik