A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Real-Time Referring Expression Comprehension by Single-Stage Grounding Network
[article]
2018
arXiv
pre-print
For further improving the localization accuracy, a guided attention mechanism is proposed to enforce the grounder to focus on the central region of the referent. ...
Moreover, by exploiting and predicting visual attribute information, the grounder can further distinguish the referent objects within an image and thereby improve the model performance. ...
the guided attention and attribute prediction modules deactivated. ...
arXiv:1812.03426v1
fatcat:oaytd2u4pffdnnjsdqh3dvszvi
MAttNet: Modular Attention Network for Referring Expression Comprehension
[article]
2018
arXiv
pre-print
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. ...
Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided. ...
To the best our knowledge, we present the first modular network for the general referring expression comprehension task. ...
arXiv:1801.08186v3
fatcat:h5n3k7aosbcfpntjonqikxxio4
MAttNet: Modular Attention Network for Referring Expression Comprehension
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. ...
Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo 1 and code 2 are provided. ...
To the best of our knowledge, we present the first modular network for the general referring expression comprehension task. ...
doi:10.1109/cvpr.2018.00142
dblp:conf/cvpr/Yu0SYLBB18
fatcat:t553k6fpi5bxrdzvqcojs6jet4
Referring Expression Comprehension: A Survey of Methods and Datasets
[article]
2020
arXiv
pre-print
Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. ...
Finally, we discuss promising future directions for the field, in particular the compositional referring expression comprehension that requires longer reasoning chain to address. ...
[32] propose a Cross-Modal Attention-guided Erasing (CM-Att-Erase) strategy for training referring expression comprehension models. CM-Att-Erase adopts the MAttNet as its backbone model. ...
arXiv:2007.09554v2
fatcat:32wmggwnezggnermyh5iw3uq2y
Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
2020
Frontiers in Neurorobotics
The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network ...
Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. ...
This work was partly funded by the German Research Foundation (DFG) and National Science Foundation (NSFC) in project Crossmodal Learning under contract Sonderforschungsbereich Transregio 169, and the ...
doi:10.3389/fnbot.2020.00043
pmid:32670046
pmcid:PMC7331387
fatcat:yx3n4s46lncvrmy5hykbvor74e
Referring Image Segmentation via Cross-Modal Progressive Comprehension
[article]
2020
arXiv
pre-print
In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. ...
features from the two modalities for accurately identifying the referred entity. ...
in generating more discriminative feature representations for referring segmentation. ...
arXiv:2010.00514v1
fatcat:ywbhpiepbrfb3knsnlrv5dhd4m
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
[article]
2019
arXiv
pre-print
there could be multiple comprehensive textual-visual correspondences between images and referring expressions. ...
To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training ...
Related Work Referring expression grounding. Referring expression grounding, also known as referring expression comprehension, is often formulated as an object retrieval task [11, 26] . ...
arXiv:1903.00839v2
fatcat:c6xlz2ly6vgqzdrn6xk5tz2hpe
Referring Image Segmentation via Cross-Modal Progressive Comprehension
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. ...
features from the two modalities for accurately identifying the referred entity. ...
in generating more discriminative feature representations for referring segmentation. ...
doi:10.1109/cvpr42600.2020.01050
dblp:conf/cvpr/HuangHLLWHLL20
fatcat:mrkku6qibzfmbkxvntujf4dwf4
Vision-Language Transformer and Query Generation for Referring Segmentation
[article]
2021
arXiv
pre-print
Furthermore, we propose a Query Generation Module, which produces multiple sets of queries with different attention weights that represent the diversified comprehensions of the language expression from ...
We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression. ...
Introduction Referring segmentation targets to generate segmentation mask for the target object referred by a given query expression in natural language [10, 16, 15, 3] . ...
arXiv:2108.05565v1
fatcat:6al5czkzzjbc7ld2xzzayypwre
Referring Expression Generation and Comprehension via Attributes
2017
2017 IEEE International Conference on Computer Vision (ICCV)
In this paper, we explore the role of attributes by incorporating them into both referring expression generation and comprehension. ...
Referring expression is a kind of language expression that used for referring to particular objects. ...
Acknowledgement The authors are grateful to Licheng Yu for helpful discussions. This work is supported by 1) the NSF CA-REER Grant #1149783, gifts from Adobe and NVIDIA. ...
doi:10.1109/iccv.2017.520
dblp:conf/iccv/Liu0017
fatcat:bpxu4pw74nb7fce7pfk2zk5ayq
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
[article]
2020
arXiv
pre-print
Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object ...
We also present an expression conditioned image and fact attention (ECIFA) network that extract information from correlated image regions and commonsense knowledge facts. ...
SLR [39] is a speaker-listener model that jointly learns for referring expression comprehension and generation. A reinforce module is introduced to guide sampling of more discriminate expressions. ...
arXiv:2006.01629v2
fatcat:bpkhwp2qs5bw7gxyuuho4pwwli
Referring Expression Object Segmentation with Caption-Aware Consistency
[article]
2019
arXiv
pre-print
that enforces the generated sentence to be similar to the given referring expression. ...
Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. ...
This work was supported in part by Ministry of Science and Technology (MOST) under grants 107-2628-E-001-005-MY3 and 108-2634-F-007-009. ...
arXiv:1910.04748v1
fatcat:wqscmarouven7hqymg5w2qscqu
Dynamic Graph Attention for Referring Expression Comprehension
[article]
2019
arXiv
pre-print
In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step ...
Referring expression comprehension aims to locate the object instance described by a natural language referring expression in an image. ...
The overall architecture of the Dynamic Graph Attention Network (DGA) for referring expression comprehension. ...
arXiv:1909.08164v1
fatcat:gjkhjbeua5amplzvmyb63odrkq
Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
[article]
2019
arXiv
pre-print
In referring expressions, people usually describe a target entity in terms of its relationship with other contextual entities as well as visual attributes. ...
Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the ...
REG [4, 9, 17, 27, 28, 43, 47, 48] is also known as referring expression comprehension or phrase localization, which is the inverse task of referring expression generation. ...
arXiv:1909.02860v1
fatcat:3362igoi3nayhfaew7iegkmtei
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
2020
Interspeech 2020
attention to automatically learn contributions for different SAN layers. ...
to quantize and label and 2) the current seq2seq framework extracts prosodic information solely from a text encoder, which is easily collapsed to an averaged expression for expressive contents. ...
semantic information through a deep encoder is effective for generating expressive speech. ...
doi:10.21437/interspeech.2020-2423
dblp:conf/interspeech/YangYWWX20
fatcat:h23spz56rvcubd4r5kyitppihu
« Previous
Showing results 1 — 15 out of 259,139 results