Filters








13,257 Hits in 4.4 sec

New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010

Stefanie Nowak, Mark J. Huiskes
2010 Conference and Labs of the Evaluation Forum  
Summarizing the results, the task could be solved with a MAP of 0.455 in the multi-modal configuration, with a MAP of 0.407 in the visual-only configuration and with a MAP of 0.234 in the textual configuration  ...  For the evaluation per example, 0.66 F-ex and 0.66 OS-FCS could be achieved for the multi-modal configuration, 0.68 F-ex and 0.65 OS-FCS for the visual configuration and 0.26 F-ex and 0.37 OS-FCS for the  ...  These Flickr user tags are made available for the textual and multi-modal approaches. For most of the photos the EXIF data is included and may be used.  ... 
dblp:conf/clef/NowakH10 fatcat:dct7qmqk55bk3fqwlova3mz2tq

Multi-feature canonical correlation analysis for face photo-sketch image retrieval

Dihong Gong, Zhifeng Li, Jianzhuang Liu, Yu Qiao
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
In this framework, we first represent each photo or sketch using a patch-based local feature representation scheme, in which histograms of oriented gradients (HOG) and multi-scale local binary pattern  ...  The major difficulty in automatic face photo-sketch image retrieval lies in the fact that there exists great discrepancy between the different image modalities (photo and sketch).  ...  and multi-scale local binary pattern (MLBP) [11] , into our framework.  ... 
doi:10.1145/2502081.2502162 dblp:conf/mm/GongLLQ13 fatcat:565c2ujxxffkjl5or3ourmdyk4

MMED: A Multi-domain and Multi-modality Event Dataset [article]

Zhenguo Yang, Zehang Lin, Min Cheng, Qing Li, Wenyin Liu
2019 arXiv   pre-print
In this work, we construct and release a multi-domain and multi-modality event dataset (MMED), containing 25,165 textual news articles collected from hundreds of news media sites (e.g., Yahoo News, Google  ...  The dataset is collected to explore the problem of organizing heterogeneous data contributed by professionals and amateurs in different data domains, and the problem of transferring event knowledge obtained  ...  CONCLUSION In this paper, we have released an event dataset from social media and news media platforms, denoted as MMED.  ... 
arXiv:1904.02354v2 fatcat:mxwxuzgt5nbqflvfawsbowchfy

A Survey of Data Representation for Multi-Modality Event Detection and Evolution

Kejing Xiao, Zhaopeng Qian, Biao Qin
2022 Applied Sciences  
Next, we discuss the techniques of data representation for event detection, including textual, visual, and multi-modality content. Finally, we review event evolution under multi-modality data.  ...  With the development of multimedia platforms, event detection has gradually developed from traditional single modality detection to multi-modality detection and is receiving increasing attention.  ...  Both the text and image metadata of photo collection are used.  ... 
doi:10.3390/app12042204 fatcat:5gpezz6yhjejlmdzr5fhpgka6m

Guest Editorial: Learning Multimedia for Real World Applications

Bing-Kun Bao, Congyan Lang, Tao Mei, Alberto del Bimbo
2016 Multimedia tools and applications  
The paper entitled BMulti-modal and Multi-scale Photo Collection Summarization^(10.1007/s11042-015-2658-6) proposes a multi-modal and multiscale photo collection summarization method by leveraging multi-modal  ...  Two papers are related to social media, one is for sentiment analysis of social network multimedia, and the other is photo collection summarization.  ... 
doi:10.1007/s11042-016-3286-5 fatcat:c3tuauih2rbidmbcheavitl5ka

The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks

Stefanie Nowak, Karolin Nagel, Judith Liebetrau
2011 Conference and Labs of the Evaluation Forum  
Both tasks differentiate among approaches that consider solely visual information, approaches that rely only on textual information in form of image metadata and user tags, and multi-modal approaches that  ...  The ImageCLEF 2011 Photo Annotation and Concept-based Retrieval Tasks pose the challenge of an automated annotation of Flickr images with 99 visual concepts and the retrieval of images based on query topics  ...  Then, Section 4 describes the test collection and the relevance assessment process. Section 5 summarizes the approaches of the participants.  ... 
dblp:conf/clef/NowakNL11 fatcat:kfjwv7jqwvbahcftxijsfsmgi4

Deep Learning for Free-Hand Sketch: A Survey [article]

Peng Xu, Timothy M. Hospedales, Qiyue Yin, Yi-Zhe Song, Tao Xiang, Liang Wang
2022 arXiv   pre-print
, e.g., natural photos.  ...  The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities  ...  of uni-modal and multi-modal sketch analysis tasks.  ... 
arXiv:2001.02600v3 fatcat:lek5sivzsrat3i52lqh2eifnia

Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks [article]

He Zhang, Benjamin S. Riggan, Shuowen Hu, Nathaniel J. Short, Vishal M.Patel
2018 arXiv   pre-print
The proposed network consists of a generator sub-network, constructed using an encoder-decoder network based on dense residual blocks, and a multi-scale discriminator sub-network.  ...  and used as a multi-channel input to synthesize the visible image given the corresponding polarimetric signatures.  ...  To summarize, this paper makes the following contributions. 1. A novel face synthesis framework based on GAN is proposed which consists of a multi-stream generator and multi-scale discriminator. 2.  ... 
arXiv:1812.05155v1 fatcat:yqhzbppyvnedviov7mtmgwjuma

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm [article]

Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan
2022 arXiv   pre-print
Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities  ...  Moreover, Scaling up the model and computing also works well in our framework.Our code, dataset and models are released at: https://github.com/Sense-GVT/DeCLIP  ...  The contributions are summarized as follows: • To the best of our knowledge, this is the first work to study self-supervision and cross-modal multi-view supervision in the million-scale image-text pre-training  ... 
arXiv:2110.05208v2 fatcat:tnbg3ibfdngephxhssdl6sqyj4

A multi-criteria context-sensitive approach for social image collection summarization

Zahra Riahi Samani, Mohsen Ebrahimi Moghaddam
2018 Sadhana (Bangalore)  
In this paper, we propose a multi-criteria context-sensitive approach for social image collection summarization.  ...  Recent increase in the number of digital photos in the content sharing and social networking websites has created an endless demand for techniques to analyze, navigate, and summarize these images.  ...  They are classified to visual and multi-modal summarization systems.  ... 
doi:10.1007/s12046-018-0908-9 fatcat:lgn7g6hokfed3f7zhmb6moqoxy

Review of Face Presentation Attack Detection Competitions [article]

Zitong Yu, Jukka Komulainen, Xiaobai Li, Guoying Zhao
2021 arXiv   pre-print
The first two challenges aimed to evaluate the effectiveness of face PAD in multi-modal setup introducing near-infrared (NIR) and depth modalities in addition to colour camera data, while the latest three  ...  The state of the art in unimodal and multi-modal face anti-spoofing has been assessed in eight international competitions organized in conjunction with major biometrics and computer vision conferences  ...  Acknowledgements This work was supported by the Academy of Finland for Academy Professor project EmotionAI, ICT 2023 project and Infotech Oulu.  ... 
arXiv:2112.11290v1 fatcat:xx3d64b3cre7zkgy3e2vvzmmsm

Online Multi-Modal Distance Metric Learning with Application to Image Retrieval

Pengcheng Wu, Steven C. H. Hoi, Peilin Zhao, Chunyan Miao, Zhi-Yong Liu
2016 IEEE Transactions on Knowledge and Data Engineering  
Such single-modal DML methods suffer from some critical limitations: (i) some type of features may significantly dominate the others in the DML task due to diverse feature representations; and (ii) learning  ...  To address these limitations, in this paper, we investigate a novel scheme of online multi-modal distance metric learning (OMDML), which explores a unified two-level online learning scheme: (i) it learns  ...  Finally, Algorithm 1 summarizes the details of the proposed Online Multi-modal Distance Metric Learning (OMDML) algorithm. Remark on Space and Time complexity.  ... 
doi:10.1109/tkde.2015.2477296 fatcat:zaiwhrrrfzcwfhfqetrro37uhu

Multi-sensor integration for unmanned terrain modeling

Sreenivas R. Sukumar, Sijie Yu, David L. Page, Andreas F. Koschan, Mongi A. Abidi, Grant R. Gerhart, Charles M. Shoemaker, Douglas W. Gage
2006 Unmanned Systems Technology VIII  
To that end, we propose a mobile, modular imaging system that incorporates multi-modal sensors for mapping unstructured arbitrary terrain.  ...  We document design issues concerning each of these sensors and present a simple temporal alignment method to integrate multi-sensor data into textured 3D models.  ...  We show an area of interest that we have scanned using our data collection system (Figure 5b ) along with the GPS path on a satellite map (Original image from www.maps.google.com) and the multi-modal  ... 
doi:10.1117/12.666249 fatcat:3h6rgmxxenc45ircotkrlhtrjy

MemexQA: Visual Memex Question Answering [article]

Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
2017 arXiv   pre-print
the collection.  ...  This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in  ...  We believe that this combination will enable us to scale to real-world applications and ever-growing user media collections.  ... 
arXiv:1708.01336v1 fatcat:jxu4ylw7wnfgharkx32sumdrse

The CASIA NIR-VIS 2.0 Face Database

Stan Z. Li, Dong Yi, Zhen Lei, Shengcai Liao
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops  
To address these issues we collected the NIR-VIS 2.0 database. It contains 725 subjects, imaged by VIS and NIR cameras in four recording sessions.  ...  Because the 3D modality in the HFB database was less used in the literature, we don't consider it in the current version.  ...  Technology Support Program Project #2013BAK02B01, EU FP7 Project #257289 (TABULA RASA), and AuthenMetric R&D Funds.  ... 
doi:10.1109/cvprw.2013.59 dblp:conf/cvpr/LiYLL13 fatcat:vcmjxn5ebrd2rj2qgrqeshyvyi
« Previous Showing results 1 — 15 out of 13,257 results