34,832 Hits in 7.4 sec

CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification [article]

Marcos V. Conde, Kerem Turgutlu
2022 arXiv   pre-print
Model's zero-shot capability allows predicting accurate natural language description for a given image, without directly optimizing for the task.  ...  To the best of our knowledge, we are one of the first methods to use CLIP (Contrastive Language-Image Pre-Training) to train a neural network on a variety of artwork images and text descriptions pairs.  ...  on 400 million images, open-sourced by OpenAI. 2. our CLIP Art contrastive pretraining using artwork images and their natural language descriptions, 3.  ... 
arXiv:2204.14244v1 fatcat:met3eonu4jhtzj7lzvlo7j6bmq

Person Tube Retrieval via Language Description

Hehe Fan, Yi Yang
This paper focuses on the problem of person tube (a sequence of bounding boxes which encloses a person in a video) retrieval using a natural language query.  ...  Experimental results on person tube retrieval via language description and other two related tasks demonstrate the efficacy of MSSP.  ...  The task of person search with natural language description (Li et al. 2017b ) then replaces image queries with natural language descriptions for re-ID.  ... 
doi:10.1609/aaai.v34i07.6704 fatcat:4k6brjtprvgwxjbrupnpe25l64

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language [article]

Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang
2020 arXiv   pre-print
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.  ...  It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss.  ...  Recently, researchers alter their attention to re-id by textual descriptions: identifying the target person by using free-form natural languages [25, 24, 3] .  ... 
arXiv:2005.07327v2 fatcat:6eww5ur4uzbvvhrmgny5jknusu

Fashion Meets Computer Vision and NLP at e-Commerce Search

Susana Zoghbi, Geert Heyman, Juan Carlos Gomez, Marie-Francine Moens
2016 International Journal of Computer and Electrical Engineering  
Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query; and 2) given a textual query that may express an  ...  The images contain fashion garments that display a great variety of visual attributes, such as different shapes, colors and textures in natural language.  ...  in natural language terms.  ... 
doi:10.17706/ijcee.2016.8.1.31-43 fatcat:ogh727phwreixkjbmueypfd2ym

Cross-Modal Fashion Search [chapter]

Susana Zoghbi, Geert Heyman, Juan Carlos Gomez, Marie-Francine Moens
2016 Lecture Notes in Computer Science  
Particularly, we demonstrate two tasks: 1) given a query image (without any accompanying text), we retrieve textual descriptions that correspond to the visual attributes in the visual query; and 2) given  ...  The first task is especially useful to manage image collections by online stores who might want to automatically organize and mine predominantly visual items according to their attributes without human  ...  Likewise, image queries without any textual annotations, retrieve words that describe the image. These are challenging tasks for both computer vision and natural language processing.  ... 
doi:10.1007/978-3-319-27674-8_35 fatcat:n2ahsipdqnee3mr2e3772o5cm4

OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval [article]

Yunhao Du, Binyu Zhang, Xiangning Ruan, Fei Su, Zhicheng Zhao, Hong Chen
2022 arXiv   pre-print
Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction.  ...  To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation  ...  Acknowledgements This work is supported by Chinese National Natural Science Foundation under Grants (62076033, U1931202) and MoE-CMCC "Artifical Intelligence" Project No.MCM20190701.  ... 
arXiv:2204.08209v2 fatcat:dymrkdwxwzew5fbgo3x4gyw6vm

TIED: A Cycle Consistent Encoder-Decoder Model for Text-to-Image Retrieval

Clint Sebastian, Raffaele Imbriaco, Panagiotis Meletis, Gijs Dubbelman, Egor Bondarev, Peter H.N. de With
2021 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Retrieving specific vehicle tracks by Natural Language (NL)-based descriptions is a convenient way to monitor vehicle movement patterns and traffic-related events.  ...  In this work, we propose TIED, a text-to-image encoder-decoder model for the simultaneous extraction of visual and textual information for vehicle track retrieval.  ...  Descriptions for these tracks in Natural Language (NL) is an appealing alternative method to enable the retrieval system to directly interact with human-given descriptions [33, 2] .  ... 
doi:10.1109/cvprw53098.2021.00467 fatcat:ooikmrmmxrbxhiy4lshfv3o3gi

Person Retrieval in Surveillance Using Textual Query: A Review [article]

Hiren Galiyawala, Mehul S Raval
2021 arXiv   pre-print
Recent advancement of research in biometrics, computer vision, and natural language processing has discovered opportunities for person retrieval from surveillance videos using textual query.  ...  The comprehensive coverage of person retrieval from handcrafted features based methods to end-to-end approaches based on natural language description.  ...  The authors acknowledge the support of NVIDIA Corporation for a donation of the Quadro K5200 GPU used for this research.  ... 
arXiv:2105.02414v1 fatcat:ieipv7c255dztmznj4hwy6s7ta

Using heterogeneous annotation and visual information for the benchmarking of image retrieval systems

Henning Müller, Paul Clough, William Hersh, Thomas Deselaers, Thomas M. Lehmann, Bruno Janvier, Antoine Geissbuhler, Simone Santini, Raimondo Schettini, Theo Gevers
2006 Internet Imaging VII  
Many image retrieval systems, and the evaluation methodologies of these systems, make use of either visual or textual information only.  ...  Only few combine textual and visual features for retrieval and evaluation. If text is used, it is often relies upon having a standardised and complete annotation schema for the entire collection.  ...  These are annotated by their authors with freely chosen keywords in a naturally multilingual manner: most authors use keywords in their native language; some combine more than one language.  ... 
doi:10.1117/12.660259 fatcat:6yrpnkatabe5hfi54kmnwajofy

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract)

Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
Automatic image description generation is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.  ...  Finally we explore future directions in the area of automatic image description.  ...  This requires the joint use of both Computer Vision (CV) and Natural Language Processing (NLP) techniques.  ... 
doi:10.24963/ijcai.2017/704 dblp:conf/ijcai/BernardiCEEEIKM17 fatcat:6xtgxxye4fg7tnbqbriljuerce

Retrieving Images with Generated Textual Descriptions

Genc Hoxha, Farid Melgani, Begum Demir
2019 IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium  
This paper presents a novel remote sensing (RS) image retrieval system that is defined based on generation and exploitation of textual descriptions that model the content of RS images.  ...  to generate the descriptions of their content, respectively.  ...  Image textual description generation The task of image textual description generation is to generate natural language description of the content of an image.  ... 
doi:10.1109/igarss.2019.8899321 dblp:conf/igarss/HoxhaMD19 fatcat:f2ts27zmhjdnnclavbmab3gvha

An Efficient Multimodal Language Processor for Parallel Input Strings in Multimodal Input Fusion

Yong Sun, Yu Shi, Fang Chen, Vera Chung
2007 International Conference on Semantic Computing (ICSC 2007)  
We discuss how to apply natural language processing to transform natural language descriptions and queries into an ontological representation that allows users to formulate formal semantics in an intuitive  ...  In this paper we present an approach that combines multimedia reasoning and natural language processing for the semantic integration of automatic and manual image annotations based on domain ontologies  ...  of the language used for textual descriptions.  ... 
doi:10.1109/icsc.2007.61 dblp:conf/semco/SunSCC07 fatcat:3tofx2vj3fbapklqkvzbjmcuby

The Use of Ontology in Retrieval: A Study on Textual, Multilingual and Multimedia Retrieval

Muhammad Nabeel Asim, Muhammad Wasim, Muhammad Usman Ghani Khan, Nasir Mahmood, Waqar Mahmood
2019 IEEE Access  
Furthermore, we compare and categorize the most recent approaches used in the above-mentioned information retrieval methods along with their major drawbacks and advantages.  ...  This paper reviews modern ontology-based information retrieval methods for textual, multimedia, and cross-lingual data types.  ...  Then they retrieved relevant audio clip based on the description provided by the users in their textual query [148] .  ... 
doi:10.1109/access.2019.2897849 fatcat:ei2zxyxdjndbvgzzue2indwqy4

Iconographic Image Captioning for Artworks [article]

Eva Cetinic
2021 arXiv   pre-print
Motivated by the state-of-the-art results achieved in generating captions for natural images, a transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset.  ...  Image captioning implies automatically generating textual descriptions of images based only on the visual input.  ...  This task implies recognizing objects and their relationship in an image and generating syntactically and semantically correct textual descriptions.  ... 
arXiv:2102.03942v1 fatcat:yyzt5qe7tbbnnpsfeot3czg37y

Study on the Influence of Vocabularies used for Image Indexing in a Multilingual Retrieval Environment

Elaine Ménard
2007 Knowledge organization  
Study on the Influence of Vocabularies used for Image Indexing in a Multilingual Retrieval Environment. Knowledge Organization, 34(2), 91-100. 23 references.  ...  Her main research interests are in multilingual information retrieval, image indexing and metadata. Menard, Elaine.  ...  The author would also like to acknowledge the hard work and dedication of the retrieval system programmer, Ms. Geneviève Bastien, and the members of the indexing team, Mr. Clément Arsenault, Ms.  ... 
doi:10.5771/0943-7444-2007-2-91 fatcat:akiiejfnfrfm5ga4rvqe3sx6ie
« Previous Showing results 1 — 15 out of 34,832 results