End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition

Katerina Papadimitriou, Gerasimos Potamianos
2019 Interspeech 2019  
Although fingerspelling is an often overlooked component of sign languages, it has great practical value in the communication of important context words that lack dedicated signs. In this paper we consider the problem of fingerspelling recognition in videos, introducing an end-to-end lexicon-free model that consists of a deep auto-encoder image feature learner followed by an attention-based encoder-decoder for prediction. The feature extractor is a vanilla auto-encoder variant, employing a
more » ... atic activation function. The learned features are subsequently fed into the attention-based encoder-decoder. The latter deviates from traditional recurrent neural network architectures, being a fully convolutional attention-based encoder-decoder that is equipped with a multi-step attention mechanism relying on a quadratic alignment function and gated linear units over the convolution output. The introduced model is evaluated on the TTIC/UChicago fingerspelling video dataset, where it outperforms previous approaches in letter accuracy under all three, signer-dependent, -adapted, and -independent, experimental paradigms.
doi:10.21437/interspeech.2019-2422 dblp:conf/interspeech/PapadimitriouP19 fatcat:nenzyye6vbe65ga7xl4mnrslyi