VQD: Visual Query Detection in Natural Scenes [article]

Manoj Acharya, Karan Jariwala, Christopher Kanan
2019 arXiv   pre-print
We propose Visual Query Detection (VQD), a new visual grounding task. In VQD, a system is guided by natural language to localize a variable number of objects in an image. VQD is related to visual referring expression recognition, where the task is to localize only one object. We describe the first dataset for VQD and we propose baseline algorithms that demonstrate the difficulty of the task compared to referring expression recognition.
arXiv:1904.02794v2 fatcat:zmmf7ludcbfurb65b6hyno426e