A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
[article]
2021
arXiv
pre-print
Recently, several approaches have explored the detection and classification of objects in videos to perform Zero-Shot Action Recognition with remarkable results. In these methods, class-object relationships are used to associate visual patterns with the semantic side information because these relationships also tend to appear in texts. Therefore, word vector methods would reflect them in their latent representations. Inspired by these methods and by video captioning's ability to describe events
arXiv:2112.09976v1
fatcat:5bvci2dyjnbyjbvo7qagnuzbpy