Filters








3,492 Hits in 4.6 sec

Cross-media Event Extraction and Recommendation

Di Lu, Clare Voss, Fangbo Tao, Xiang Ren, Rachel Guan, Rostyslav Korolov, Tongtao Zhang, Dongang Wang, Hongzhi Li, Taylor Cassidy, Heng Ji, Shih-fu Chang (+5 others)
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations  
We have developed a comprehensive system that searches, identifies, organizes and summarizes complex events from multiple data modalities.  ...  It also recommends events related to the user's ongoing search based on previously selected attribute values and dimensions of events being viewed.  ...  Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.  ... 
doi:10.18653/v1/n16-3015 dblp:conf/naacl/LuVTRGKZWLCJCHW16 fatcat:kxehxhclqzacpa6rtxijgqgsqy

On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval

Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, Nuno Vasconcelos
2014 IEEE Transactions on Pattern Analysis and Machine Intelligence  
All approaches are shown successful for text retrieval in response to image queries and vice versa.  ...  This problem addresses the design of retrieval systems that support queries across content modalities, for example, using an image to search for texts.  ...  For example, in [29] a query text, and in [30] a query image is used to retrieve similar text documents and images, based on low-level text (e.g., words) and image (e.g., DCTs) representations, respectively  ... 
doi:10.1109/tpami.2013.142 pmid:24457508 fatcat:nnzkvhf4l5f4rb2kxgqt5banfe

Cross-Modal Information Retrieval – A Case Study on Chinese Wikipedia [chapter]

Yonghui Cong, Zengchang Qin, Jing Yu, Tao Wan
2012 Lecture Notes in Computer Science  
Probability models have been used in cross-modal multimedia information retrieval recently by building conjunctive models bridging the text and image components.  ...  We investigate the problems of retrieving texts (ranked by semantic closeness) given an image query, and vice versa.  ...  This work is partially funded by the NCET Program of MOE, the SRF for ROCS, the Fundamental Research Funds for the Central Universities and Graduate Innovative Practice Fund of BUAA.  ... 
doi:10.1007/978-3-642-35527-1_2 fatcat:4mjzgdhvfrax7nmb4vawi6ppva

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval [article]

Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua
2020 arXiv   pre-print
To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos  ...  Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics.  ...  Figure 2 : 2 An illustration of our tree-augmented cross-modal encoding method for complex-query video retrieval.  ... 
arXiv:2007.02503v1 fatcat:eptt6v2lirbgxet6bqm7wjjpzu

Deep Learning Techniques for Future Intelligent Cross-Media Retrieval [article]

Sadaqat ur Rehman, Muhammad Waqas, Shanshan Tu, Anis Koubaa, Obaid ur Rehman, Jawad Ahmad, Muhammad Hanif, Zhu Han
2020 arXiv   pre-print
Then, we present some well-known cross-media datasets used for retrieval, considering the importance of these datasets in the context in of deep learning based cross-media retrieval approaches.  ...  In this paper, we provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches in solving cross-media retrieval, namely: representation, alignment, and translation  ...  Multimodal alignment is significant for cross-media retrieval, as it allows us to retrieve the contents of different modality based on input query (e.g., image retrieval in case of the text as a query,  ... 
arXiv:2008.01191v1 fatcat:t63bg55w2vdqjcprzaaidrmprq

Dual Encoding for Zero-Example Video Retrieval [article]

Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang
2019 arXiv   pre-print
Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is required.  ...  The majority of existing methods are concept based, extracting relevant concepts from queries and videos and accordingly establishing associations between the two modalities.  ...  As for query representation, the authors design relatively complex linguistic rules to extract relevant concepts from a given query. Ueki et al.  ... 
arXiv:1809.06181v3 fatcat:tkjlbrflojhazdeq2wcihhdony

Deep Multimodal Learning for Affective Analysis and Retrieval

Lei Pang, Shiai Zhu, Chong-Wah Ngo
2015 IEEE transactions on multimedia  
emotion classification and cross-modal retrieval.  ...  More importantly, the joint representation enables emotion-oriented cross-modal retrieval, for example, retrieval of videos using the text query "crazy cat".  ...  For the visual modality, different from the results in Table III, SentiBank and E-MDBM-V TABLE IV MEAN AVERAGE PRECISION@20 OF TEXT-BASED, VIDEO-BASED, AND MULTIMODAL QUERY FOR RETRIEVING EMOTIONAL  ... 
doi:10.1109/tmm.2015.2482228 fatcat:7tozmatnhvbj7hjjohkofngecq

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering [article]

Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral
2021 arXiv   pre-print
We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision.  ...  One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval.  ...  Acknowledgements The authors acknowledge support from the NSF grant 1816039, DARPA grant W911NF2020006, DARPA grant FA875019C0003, and ONR award N00014-20-1-2332; and thank the reviewers for their feedback  ... 
arXiv:2109.04014v1 fatcat:rnm2ghrosbd4xkctt4jnozfndu

A Survey on Content-based Image Retrieval

Mohamed Maher
2017 International Journal of Advanced Computer Science and Applications  
In this article, a survey on state of the art content based image retrieval including empirical and theoretical work is proposed.  ...  These databases can be counter-productive if they are not coupled with efficient Content-Based Image Retrieval (CBIR) tools.  ...  ACKNOWLEDGMENT This work was supported by the Research Centre of the College of Computer and Information Sciences, King Saud University. The author is grateful for this support.  ... 
doi:10.14569/ijacsa.2017.080521 fatcat:kzfskamd25coxcj3537z6z3ty4

A support vector approach for cross-modal search of images and texts

Yashaswi Verma, C.V. Jawahar
2017 Computer Vision and Image Understanding  
In this paper, we study two complementary cross-modal prediction tasks: (i) predicting text(s) given a query image ("Im2Text"), and (ii) predicting image(s) given a piece of text ("Text2Im").  ...  We propose a novel Structural SVM based unified framework for these two tasks, and show how it can be efficiently trained and tested.  ...  This implies that normalized correlation based loss func-920 tion models the cross-modal patterns better than the other two loss functions.  ... 
doi:10.1016/j.cviu.2016.10.001 fatcat:4762cgs7cbflxh72kke72cagyi

Intermediate Annotationless Dynamical Object-Index-Based Query in Large Image Archives with Holographic Representation

Javed I. Khan
1996 Journal of Visual Communication and Image Representation  
This paper presents a new parallel and distributed associative network based technique for content-based image retrieval (CBIR) with dynamic indices.  ...  The paper presents the mechanism, architecture and performance of an image archival and retrieval system realized with this new network.  ...  and content-based retrieval in image archives [6, 10] .  ... 
doi:10.1006/jvci.1996.0033 fatcat:c6u52bcfcrdtflct43inog2kqq

Visual Goal-Step Inference using wikiHow [article]

Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar, Chris Callison-Burch
2021 arXiv   pre-print
We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal.  ...  With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models.  ...  We thank Chenyu Liu for annotations. We also thank Simmi Mourya, Keren Fuentes, Carl Vondrick, Zsolt Kira, Mohit Bansal, Lara Martin, and anonymous reviewers for their valuable feedback.  ... 
arXiv:2104.05845v2 fatcat:nsli5d55zza3hjsyeih2j3aili

VL-BERT: Pre-training of Generic Visual-Linguistic Representations [article]

Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
2020 arXiv   pre-print
We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).  ...  It is designed to fit for most of the visual-linguistic downstream tasks.  ...  ACKNOWLEDGMENTS The work is partially supported by the National Natural Science Foundation of China under grand No.U19B2044 and No.61836011.  ... 
arXiv:1908.08530v4 fatcat:venc4egmz5hhbe4oeyt5f2wgku

Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks [chapter]

Paul Clough, Michael Grubinger, Thomas Deselaers, Allan Hanbury, Henning Müller
2007 Lecture Notes in Computer Science  
Topics have been categorised and analysed with respect to attributes including an estimation of their "visualness" and linguistic complexity.  ...  These tasks provide both the resources and the framework necessary to perform comparative laboratorystyle evaluation of visual information systems for image retrieval and automatic image annotation.  ...  Special thanks to viventura, the IAPR and LTUtech for providing their image databases for this years' tasks, and to Tobias Weyand for creating the web interface for submissions.  ... 
doi:10.1007/978-3-540-74999-8_71 fatcat:nwavr7byzbbflp3b7wvxd4lem4

From Visual Attributes to Adjectives through Decompositional Distributional Semantics

Angeliki Lazaridou, Georgiana Dinu, Adam Liska, Marco Baroni
2015 Transactions of the Association for Computational Linguistics  
We can thus achieve better attribute (and object) label retrieval by treating images as "visual phrases", and decomposing their linguistic representation into an attribute-denoting adjective and an object-denoting  ...  By building on the recent "zero-shot learning" approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specifically adjectives, we show that it is possible to tag images  ...  Acknowledgments We thank the TACL reviewers for their feedback. We were supported by ERC 2011 Starting Independent Research Grant n. 283554 (COMPOSES).  ... 
doi:10.1162/tacl_a_00132 fatcat:cig3svaf75f57jjgo2r4bxfeum
« Previous Showing results 1 — 15 out of 3,492 results