A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Polar Relative Positional Encoding for Video-Language Segmentation
2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
In this paper, we tackle a challenging task named video-language segmentation. Given a video and a sentence in natural language, the goal is to segment the object or actor described by the sentence in video frames. To accurately denote a target object, the given sentence usually refers to multiple attributes, such as nearby objects with spatial relations, etc. In this paper, we propose a novel Polar Relative Positional Encoding (PRPE) mechanism that represents spatial relations in a
doi:10.24963/ijcai.2020/132
dblp:conf/ijcai/NingXW020
fatcat:mktrb7kgbzcqbgmywwgtrm23my