A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Two-stage Visual Cues Enhancement Network for Referring Image Segmentation
[article]
2021
arXiv
pre-print
Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression. The diverse and flexible expressions as well as complex visual contents in the images raise the RIS model with higher demands for investigating fine-grained matching behaviors between words in expressions and objects presented in images. However, such matching behaviors are hard to be learned and captured when the visual cues of referents (i.e. referred
arXiv:2110.04435v1
fatcat:zt23iztwbjbdhlxaxsnvfmuyei