3,331 Hits in 4.0 sec

Distilled Collections from Textual Image Queries

Hadar Averbuch-Elor, Yunhai Wang, Yiming Qian, Minglun Gong, Johannes Kopf, Hao Zhang, Daniel Cohen-Or
2015 Computer graphics forum (Print)  
We present a distillation algorithm which operates on a large, unstructured, and noisy collection of internet images returned from an online object query.  ...  In essence, instead of distilling the collection of images, we distill a collection of loosely cutout foreground "shapes", which may or may not contain the queried object.  ...  The objective of our work is to generate a distilled image collection from raw internet search results based on a textual object query.  ... 
doi:10.1111/cgf.12547 fatcat:mpypdujyzrhahos7oy5prisgum

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval [article]

Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara
2022 arXiv   pre-print
Then, it learns a shared embedding space - where an efficient kNN search can be performed - by distilling the relevance scores obtained from the fine-grained alignments.  ...  Nonetheless, it has a direct downstream application: cross-modal retrieval, which consists in finding images related to a given query text or vice-versa.  ...  Heritage (AI4CH)" project, co-funded by the Italian Ministry of Foreign Affairs and International Cooperation, and by the PRIN project "CREATIVE: CRossmodal understanding and gEnerATIon of Visual and tExtual  ... 
arXiv:2207.14757v1 fatcat:g2uyfwtpifbozi65k3fiauue3m

Evaluating topic representations for exploring document collections

Nikolaos Aletras, Timothy Baldwin, Jey Han Lau, Mark Stevenson
2015 Journal of the Association for Information Science and Technology  
Results show that textual labels are easier for users to interpret than are term lists and image labels.  ...  Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase  ...  Acknowledgments We are grateful for support received from the Engineering and Physical Sciences Research Council via the Network on Vision and Language, the Australian Research Council, and the Defence  ... 
doi:10.1002/asi.23574 fatcat:qgm4gq5g7fbipggeodwljfunwi

Knowledge Resource Development for Identifying Matching Image Descriptions

Alicia Sagae, Scott E. Fahlman
2013 Proceedings of the International Conference on Knowledge Engineering and Ontology Development  
This paper describes the incremental, task-driven development of an ontology that provides features to a system that retrieves images based on their textual descriptions.  ...  knowledge resources contribute to the performance of many current systems for textual inference tasks (QA, textual entailment, summarization, retrieval, and others).  ...  As a result, query topics and indexed images can be represented by textual features (text-based image retrieval) or by features derived from computervision analysis of the image (content-based image retrieval  ... 
doi:10.5220/0004550601000108 dblp:conf/ic3k/SagaeF13 fatcat:22cuezse4fdkvfisn2ztfuvc4m

Image Classification using Tag and Segmentation based Retrieval

Shrikant Badghaiya, Atul Bharve
2014 International Journal of Computer Applications  
In today's scenario when social media sites are widely used and high resolution images are shared. Tagging is an important approach for the retrieval of images in various applications.  ...  Here in this paper various tag based image technique are discussed so by analyzing the tag based image retrieval techniques a new proposed methodology can be implemented in the future.  ...  The textual query is then matched with annotations for image retrieval in the image retrieval system.  ... 
doi:10.5120/18151-9413 fatcat:hpofvu7snneu3fnjex7qr4doy4

Cross-media Event Extraction and Recommendation

Di Lu, Clare Voss, Fangbo Tao, Xiang Ren, Rachel Guan, Rostyslav Korolov, Tongtao Zhang, Dongang Wang, Hongzhi Li, Taylor Cassidy, Heng Ji, Shih-fu Chang (+5 others)
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations  
., texts, images, videos) posted on the Web during events of general interest is overwhelming and difficult to distill if seeking information relevant to a particular concern.  ...  We have developed a comprehensive system that searches, identifies, organizes and summarizes complex events from multiple data modalities.  ...  We first extract textual event information, as well as visual concepts, events and patterns from the raw multimedia documents to construct event cubes.  ... 
doi:10.18653/v1/n16-3015 dblp:conf/naacl/LuVTRGKZWLCJCHW16 fatcat:kxehxhclqzacpa6rtxijgqgsqy

Learning Deep Features For MSR-bing Information Retrieval Challenge

Qiang Song, Sixie Yu, Cong Leng, JiaXiang Wu, Qinghao Hu, Jian Cheng
2015 Proceedings of the 23rd ACM international conference on Multimedia - MM '15  
Our CNN model is pre-trained on a collection of clean datasets and fine-tuned on the bing datasets.  ...  In this paper, we propose a CNN-based feature representation for visual recognition only using image-level information.  ...  The dataset of MSR-bing Grand Challenge contains 11.7 million queries and 1 million images which were collected from the user click logs of bing image search in the EN-US market [3] .  ... 
doi:10.1145/2733373.2809928 dblp:conf/mm/SongYLWHC15 fatcat:xiqww2ztrfherjplvyirykzyzq

CLIP2TV: Align, Match and Distill for Video-Text Retrieval [article]

Zijian Gao, Jingyu Liu, Weiqi Sun, Sheng Chen, Dedan Chang, Lili Zhao
2022 arXiv   pre-print
With the success on both visual and textual representation learning, transformer based encoders and fusion methods have also been adopted in the field of video-text retrieval.  ...  With the help of similarity distillation from joint soft-label on vtm module, CLIP2TV+SD can make a great improvement when querying a detailed text.  ...  DiDeMo [1] includes 10,611 videos collected from Flicker and length of each is a maximum of 30 seconds.  ... 
arXiv:2111.05610v2 fatcat:mweypvpbw5d6lbhd2y47o72zcy

Exploring and Distilling Cross-Modal Information for Image Captioning [article]

Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
2020 arXiv   pre-print
Based on the Transformer, to perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that  ...  explores and distills the source information in vision and language.  ...  We also show the top-3 most attended image regions in the vision region group from global visual distilling and the top-3 most attended textual attributes in global attribute distilling.  ... 
arXiv:2002.12585v2 fatcat:vupouqlxhzeuvfqxzqk5n5u6di

CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification [article]

Marcos V. Conde, Kerem Turgutlu
2022 arXiv   pre-print
CLIP is able to learn directly from free-form art descriptions, or, if available, curated fine-grained labels.  ...  To the best of our knowledge, we are one of the first methods to use CLIP (Contrastive Language-Image Pre-Training) to train a neural network on a variety of artwork images and text descriptions pairs.  ...  Dataset and Benchmark The iMet Collection Dataset [26] from The Metropolitan Museum of Art in New York (The Met), presents the largest fine-grained artwork collection.  ... 
arXiv:2204.14244v1 fatcat:met3eonu4jhtzj7lzvlo7j6bmq

External Query Reformulation for Text-Based Image Retrieval [chapter]

Jinming Min, Gareth J. F. Jones
2011 Lecture Notes in Computer Science  
These definition documents are used as indicators to re-weight the feedback documents from an initial search run on a Wikipedia abstract collection using the Jaccard coefficient.  ...  A standard method used to address this problem is pseudo relevance feedback (PRF) which updates user queries by adding feedback terms selected automatically from top ranked documents in a prior retrieval  ...  using text queries to search for images based on textual annotations of the images.  ... 
doi:10.1007/978-3-642-24583-1_24 fatcat:gw2nw3vtybg7jb2c6rmwkkqf24

Towards a digital library of popular music

David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, Rodger J. McNab
1999 Proceedings of the fourth ACM conference on Digital libraries - DL '99  
We work with different representations of music: facsimile images of scores, the internal representation of a music editing program, page images typeset by a music editor, MIDI files, audio files representing  ...  sung user input, and textual metadata such as title, composer and arranger, and lyrics.  ...  For example, Figure 5 shows the standard query page for the collection formed from a single MIDI site (approximately 1,200 tunes).  ... 
doi:10.1145/313238.313295 dblp:conf/dl/BainbridgeNWSM99 fatcat:rpbv2fo75feg3j23rhp45q47ie

Exploring and Distilling Cross-Modal Information for Image Captioning

Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the  ...  Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding.  ...  We also show the top-3 most attended image regions in the vision region group from global visual distilling and the top-3 most attended textual attributes in global attribute distilling.  ... 
doi:10.24963/ijcai.2019/708 dblp:conf/ijcai/LiuRLL019 fatcat:kkuznjnlbjapta4d3wissgluzu

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language [article]

Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang
2020 arXiv   pre-print
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.  ...  It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss.  ...  Essentially, we are distilling the attribute information from a well-trained human parsing networks to the lightweight segmentation layer through joint training 1 . Discussion.  ... 
arXiv:2005.07327v2 fatcat:6eww5ur4uzbvvhrmgny5jknusu

The Evolution of a Healthcare Software Framework: Reuse, Evaluation and Lessons Learned

Alessandra Macedo, José Augusto Baranauskas, Renato Bulcão-Neto
2018 Proceedings of the 2018 Federated Conference on Computer Science and Information Systems  
The main contribution of this paper includes lessons learned distilled from (i) the reuse and evolution of the HSSF components on the development of three new health surveillance applications, and (ii)  ...  Using concepts and technologies from Information Retrieval, Machine Learning, and Semantic Web, we present a novel software framework called HSSF (Health Surveillance Software Framework) which aims to  ...  Textual Processing module processes textual information from a set of clinical records and collected scientific papers, which are all stored in the Storage Layer.  ... 
doi:10.15439/2018f173 dblp:conf/fedcsis/MacedoBN18 fatcat:usk4by2dtfgcnhbt4u3vjlwomi
« Previous Showing results 1 — 15 out of 3,331 results