SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021 [article]

Swathikiran Sudhakaran and Adrian Bulat and Juan-Manuel Perez-Rua and Alex Falcon and Sergio Escalera and Oswald Lanz and Brais Martinez and Georgios Tzimiropoulos
2021 arXiv   pre-print
This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a convolution free video feature extractor based on transformer architecture. We design an ensemble
more » ... GSF and XViT model families with different backbones and pretraining to generate the prediction scores. Our submission, visible on the public leaderboard, achieved a top-1 action recognition accuracy of 44.82%, using only RGB.
arXiv:2110.02902v1 fatcat:ikbzj6ic7zgwbkjrbpeexwvn2i