Filters








259,139 Hits in 5.6 sec

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network [article]

Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, Jiebo Luo
2018 arXiv   pre-print
For further improving the localization accuracy, a guided attention mechanism is proposed to enforce the grounder to focus on the central region of the referent.  ...  Moreover, by exploiting and predicting visual attribute information, the grounder can further distinguish the referent objects within an image and thereby improve the model performance.  ...  the guided attention and attribute prediction modules deactivated.  ... 
arXiv:1812.03426v1 fatcat:oaytd2u4pffdnnjsdqh3dvszvi

MAttNet: Modular Attention Network for Referring Expression Comprehension [article]

Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L.Berg
2018 arXiv   pre-print
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.  ...  Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.  ...  To the best our knowledge, we present the first modular network for the general referring expression comprehension task.  ... 
arXiv:1801.08186v3 fatcat:h5n3k7aosbcfpntjonqikxxio4

MAttNet: Modular Attention Network for Referring Expression Comprehension

Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.  ...  Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo 1 and code 2 are provided.  ...  To the best of our knowledge, we present the first modular network for the general referring expression comprehension task.  ... 
doi:10.1109/cvpr.2018.00142 dblp:conf/cvpr/Yu0SYLBB18 fatcat:t553k6fpi5bxrdzvqcojs6jet4

Referring Expression Comprehension: A Survey of Methods and Datasets [article]

Yanyuan Qiao, Chaorui Deng, Qi Wu
2020 arXiv   pre-print
Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language.  ...  Finally, we discuss promising future directions for the field, in particular the compositional referring expression comprehension that requires longer reasoning chain to address.  ...  [32] propose a Cross-Modal Attention-guided Erasing (CM-Att-Erase) strategy for training referring expression comprehension models. CM-Att-Erase adopts the MAttNet as its backbone model.  ... 
arXiv:2007.09554v2 fatcat:32wmggwnezggnermyh5iw3uq2y

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing

Jinpeng Mi, Jianzhi Lyu, Song Tang, Qingdu Li, Jianwei Zhang
2020 Frontiers in Neurorobotics  
The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network  ...  Specifically, we first propose a referring expression comprehension network to ground natural referring expressions.  ...  This work was partly funded by the German Research Foundation (DFG) and National Science Foundation (NSFC) in project Crossmodal Learning under contract Sonderforschungsbereich Transregio 169, and the  ... 
doi:10.3389/fnbot.2020.00043 pmid:32670046 pmcid:PMC7331387 fatcat:yx3n4s46lncvrmy5hykbvor74e

Referring Image Segmentation via Cross-Modal Progressive Comprehension [article]

Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li
2020 arXiv   pre-print
In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.  ...  features from the two modalities for accurately identifying the referred entity.  ...  in generating more discriminative feature representations for referring segmentation.  ... 
arXiv:2010.00514v1 fatcat:ywbhpiepbrfb3knsnlrv5dhd4m

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing [article]

Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
2019 arXiv   pre-print
there could be multiple comprehensive textual-visual correspondences between images and referring expressions.  ...  To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training  ...  Related Work Referring expression grounding. Referring expression grounding, also known as referring expression comprehension, is often formulated as an object retrieval task [11, 26] .  ... 
arXiv:1903.00839v2 fatcat:c6xlz2ly6vgqzdrn6xk5tz2hpe

Referring Image Segmentation via Cross-Modal Progressive Comprehension

Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.  ...  features from the two modalities for accurately identifying the referred entity.  ...  in generating more discriminative feature representations for referring segmentation.  ... 
doi:10.1109/cvpr42600.2020.01050 dblp:conf/cvpr/HuangHLLWHLL20 fatcat:mrkku6qibzfmbkxvntujf4dwf4

Vision-Language Transformer and Query Generation for Referring Segmentation [article]

Henghui Ding, Chang Liu, Suchen Wang, Xudong Jiang
2021 arXiv   pre-print
Furthermore, we propose a Query Generation Module, which produces multiple sets of queries with different attention weights that represent the diversified comprehensions of the language expression from  ...  We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.  ...  Introduction Referring segmentation targets to generate segmentation mask for the target object referred by a given query expression in natural language [10, 16, 15, 3] .  ... 
arXiv:2108.05565v1 fatcat:6al5czkzzjbc7ld2xzzayypwre

Referring Expression Generation and Comprehension via Attributes

Jingyu Liu, Liang Wang, Ming-Hsuan Yang
2017 2017 IEEE International Conference on Computer Vision (ICCV)  
In this paper, we explore the role of attributes by incorporating them into both referring expression generation and comprehension.  ...  Referring expression is a kind of language expression that used for referring to particular objects.  ...  Acknowledgement The authors are grateful to Licheng Yu for helpful discussions. This work is supported by 1) the NSF CA-REER Grant #1149783, gifts from Adobe and NVIDIA.  ... 
doi:10.1109/iccv.2017.520 dblp:conf/iccv/Liu0017 fatcat:bpxu4pw74nb7fce7pfk2zk5ayq

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge [article]

Peng Wang, Dongyang Liu, Hui Li, Qi Wu
2020 arXiv   pre-print
Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object  ...  We also present an expression conditioned image and fact attention (ECIFA) network that extract information from correlated image regions and commonsense knowledge facts.  ...  SLR [39] is a speaker-listener model that jointly learns for referring expression comprehension and generation. A reinforce module is introduced to guide sampling of more discriminate expressions.  ... 
arXiv:2006.01629v2 fatcat:bpkhwp2qs5bw7gxyuuho4pwwli

Referring Expression Object Segmentation with Caption-Aware Consistency [article]

Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang
2019 arXiv   pre-print
that enforces the generated sentence to be similar to the given referring expression.  ...  Referring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations.  ...  This work was supported in part by Ministry of Science and Technology (MOST) under grants 107-2628-E-001-005-MY3 and 108-2634-F-007-009.  ... 
arXiv:1910.04748v1 fatcat:wqscmarouven7hqymg5w2qscqu

Dynamic Graph Attention for Referring Expression Comprehension [article]

Sibei Yang, Guanbin Li, Yizhou Yu
2019 arXiv   pre-print
In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step  ...  Referring expression comprehension aims to locate the object instance described by a natural language referring expression in an image.  ...  The overall architecture of the Dynamic Graph Attention Network (DGA) for referring expression comprehension.  ... 
arXiv:1909.08164v1 fatcat:gjkhjbeua5amplzvmyb63odrkq

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding [article]

Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang
2019 arXiv   pre-print
In referring expressions, people usually describe a target entity in terms of its relationship with other contextual entities as well as visual attributes.  ...  Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the  ...  REG [4, 9, 17, 27, 28, 43, 47, 48] is also known as referring expression comprehension or phrase localization, which is the inverse task of referring expression generation.  ... 
arXiv:1909.02860v1 fatcat:3362igoi3nayhfaew7iegkmtei

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie
2020 Interspeech 2020  
attention to automatically learn contributions for different SAN layers.  ...  to quantize and label and 2) the current seq2seq framework extracts prosodic information solely from a text encoder, which is easily collapsed to an averaged expression for expressive contents.  ...  semantic information through a deep encoder is effective for generating expressive speech.  ... 
doi:10.21437/interspeech.2020-2423 dblp:conf/interspeech/YangYWWX20 fatcat:h23spz56rvcubd4r5kyitppihu
« Previous Showing results 1 — 15 out of 259,139 results