1 Hit in 0.68 sec

Action2Vec: A Crossmodal Embedding Approach to Action Learning [article]

Meera Hahn, Andrew Silva, James M. Rehg
2019 arXiv   pre-print
Our approach uses a hierarchical recurrent network to capture the temporal structure of video features.  ...  We describe a novel cross-modal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatio-temporal features derived from video clips.  ...  In order to get a single embedding for an action class, we average the Action2Vec vectors for all videos of that action class.  ... 
arXiv:1901.00484v1 fatcat:umxyer4iurgjte2wkvl5egglw4