A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
[article]
2022
arXiv
pre-print
Model's zero-shot capability allows predicting accurate natural language description for a given image, without directly optimizing for the task. ...
To the best of our knowledge, we are one of the first methods to use CLIP (Contrastive Language-Image Pre-Training) to train a neural network on a variety of artwork images and text descriptions pairs. ...
on 400 million images, open-sourced by OpenAI. 2. our CLIP Art contrastive pretraining using artwork images and their natural language descriptions, 3. ...
arXiv:2204.14244v1
fatcat:met3eonu4jhtzj7lzvlo7j6bmq
Person Tube Retrieval via Language Description
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
This paper focuses on the problem of person tube (a sequence of bounding boxes which encloses a person in a video) retrieval using a natural language query. ...
Experimental results on person tube retrieval via language description and other two related tasks demonstrate the efficacy of MSSP. ...
The task of person search with natural language description (Li et al. 2017b ) then replaces image queries with natural language descriptions for re-ID. ...
doi:10.1609/aaai.v34i07.6704
fatcat:4k6brjtprvgwxjbrupnpe25l64
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
[article]
2020
arXiv
pre-print
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. ...
It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. ...
Recently, researchers alter their attention to re-id by textual descriptions: identifying the target person by using free-form natural languages [25, 24, 3] . ...
arXiv:2005.07327v2
fatcat:6eww5ur4uzbvvhrmgny5jknusu
Fashion Meets Computer Vision and NLP at e-Commerce Search
2016
International Journal of Computer and Electrical Engineering
Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query; and 2) given a textual query that may express an ...
The images contain fashion garments that display a great variety of visual attributes, such as different shapes, colors and textures in natural language. ...
in natural language terms. ...
doi:10.17706/ijcee.2016.8.1.31-43
fatcat:ogh727phwreixkjbmueypfd2ym
Cross-Modal Fashion Search
[chapter]
2016
Lecture Notes in Computer Science
Particularly, we demonstrate two tasks: 1) given a query image (without any accompanying text), we retrieve textual descriptions that correspond to the visual attributes in the visual query; and 2) given ...
The first task is especially useful to manage image collections by online stores who might want to automatically organize and mine predominantly visual items according to their attributes without human ...
Likewise, image queries without any textual annotations, retrieve words that describe the image. These are challenging tasks for both computer vision and natural language processing. ...
doi:10.1007/978-3-319-27674-8_35
fatcat:n2ahsipdqnee3mr2e3772o5cm4
OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval
[article]
2022
arXiv
pre-print
Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction. ...
To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation ...
Acknowledgements This work is supported by Chinese National Natural Science Foundation under Grants (62076033, U1931202) and MoE-CMCC "Artifical Intelligence" Project No.MCM20190701. ...
arXiv:2204.08209v2
fatcat:dymrkdwxwzew5fbgo3x4gyw6vm
TIED: A Cycle Consistent Encoder-Decoder Model for Text-to-Image Retrieval
2021
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Retrieving specific vehicle tracks by Natural Language (NL)-based descriptions is a convenient way to monitor vehicle movement patterns and traffic-related events. ...
In this work, we propose TIED, a text-to-image encoder-decoder model for the simultaneous extraction of visual and textual information for vehicle track retrieval. ...
Descriptions for these tracks in Natural Language (NL) is an appealing alternative method to enable the retrieval system to directly interact with human-given descriptions [33, 2] . ...
doi:10.1109/cvprw53098.2021.00467
fatcat:ooikmrmmxrbxhiy4lshfv3o3gi
Person Retrieval in Surveillance Using Textual Query: A Review
[article]
2021
arXiv
pre-print
Recent advancement of research in biometrics, computer vision, and natural language processing has discovered opportunities for person retrieval from surveillance videos using textual query. ...
The comprehensive coverage of person retrieval from handcrafted features based methods to end-to-end approaches based on natural language description. ...
The authors acknowledge the support of NVIDIA Corporation for a donation of the Quadro K5200 GPU used for this research. ...
arXiv:2105.02414v1
fatcat:ieipv7c255dztmznj4hwy6s7ta
Using heterogeneous annotation and visual information for the benchmarking of image retrieval systems
2006
Internet Imaging VII
Many image retrieval systems, and the evaluation methodologies of these systems, make use of either visual or textual information only. ...
Only few combine textual and visual features for retrieval and evaluation. If text is used, it is often relies upon having a standardised and complete annotation schema for the entire collection. ...
These are annotated by their authors with freely chosen keywords in a naturally multilingual manner: most authors use keywords in their native language; some combine more than one language. ...
doi:10.1117/12.660259
fatcat:6yrpnkatabe5hfi54kmnwajofy
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract)
2017
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Automatic image description generation is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. ...
Finally we explore future directions in the area of automatic image description. ...
This requires the joint use of both Computer Vision (CV) and Natural Language Processing (NLP) techniques. ...
doi:10.24963/ijcai.2017/704
dblp:conf/ijcai/BernardiCEEEIKM17
fatcat:6xtgxxye4fg7tnbqbriljuerce
Retrieving Images with Generated Textual Descriptions
2019
IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium
This paper presents a novel remote sensing (RS) image retrieval system that is defined based on generation and exploitation of textual descriptions that model the content of RS images. ...
to generate the descriptions of their content, respectively. ...
Image textual description generation The task of image textual description generation is to generate natural language description of the content of an image. ...
doi:10.1109/igarss.2019.8899321
dblp:conf/igarss/HoxhaMD19
fatcat:f2ts27zmhjdnnclavbmab3gvha
An Efficient Multimodal Language Processor for Parallel Input Strings in Multimodal Input Fusion
2007
International Conference on Semantic Computing (ICSC 2007)
We discuss how to apply natural language processing to transform natural language descriptions and queries into an ontological representation that allows users to formulate formal semantics in an intuitive ...
In this paper we present an approach that combines multimedia reasoning and natural language processing for the semantic integration of automatic and manual image annotations based on domain ontologies ...
of the language used for textual descriptions. ...
doi:10.1109/icsc.2007.61
dblp:conf/semco/SunSCC07
fatcat:3tofx2vj3fbapklqkvzbjmcuby
The Use of Ontology in Retrieval: A Study on Textual, Multilingual and Multimedia Retrieval
2019
IEEE Access
Furthermore, we compare and categorize the most recent approaches used in the above-mentioned information retrieval methods along with their major drawbacks and advantages. ...
This paper reviews modern ontology-based information retrieval methods for textual, multimedia, and cross-lingual data types. ...
Then they retrieved relevant audio clip based on the description provided by the users in their textual query [148] . ...
doi:10.1109/access.2019.2897849
fatcat:ei2zxyxdjndbvgzzue2indwqy4
Iconographic Image Captioning for Artworks
[article]
2021
arXiv
pre-print
Motivated by the state-of-the-art results achieved in generating captions for natural images, a transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset. ...
Image captioning implies automatically generating textual descriptions of images based only on the visual input. ...
This task implies recognizing objects and their relationship in an image and generating syntactically and semantically correct textual descriptions. ...
arXiv:2102.03942v1
fatcat:yyzt5qe7tbbnnpsfeot3czg37y
Study on the Influence of Vocabularies used for Image Indexing in a Multilingual Retrieval Environment
2007
Knowledge organization
Study on the Influence of Vocabularies used for Image Indexing in a Multilingual Retrieval Environment. Knowledge Organization, 34(2), 91-100. 23 references. ...
Her main research interests are in multilingual information retrieval, image indexing and metadata. Menard, Elaine. ...
The author would also like to acknowledge the hard work and dedication of the retrieval system programmer, Ms. Geneviève Bastien, and the members of the indexing team, Mr. Clément Arsenault, Ms. ...
doi:10.5771/0943-7444-2007-2-91
fatcat:akiiejfnfrfm5ga4rvqe3sx6ie
« Previous
Showing results 1 — 15 out of 34,832 results