501 Hits in 6.7 sec

The Multimodal Neighborhood Signature for Modeling Object Color Appearance and Applications in Object Recognition and Image Retrieval

J. Matas, D. Koubaroulis, J. Kittler
2002 Computer Vision and Image Understanding  
We propose a general-purpose color-based object model called the Multimodal Neighborhood Signature (MNS) with applications in object recognition and image retrieval.  ...  Results show good and fast performance under severe scale, viewpoint, occlusion, and background change using a single image for object modeling.  ...  In our approach, local color structure is represented by illumination invariant features computed from image neighborhoods with a multimodal color density function.  ... 
doi:10.1006/cviu.2002.0965 fatcat:4hsrqut6gfeszef5gskb4sy2ga

WxBS: Wide Baseline Stereo Generalizations [article]

Dmytro Mishkin and Jiri Matas and Michal Perdoch and Karel Lenc
2015 arXiv   pre-print
We show that simple adaptive thresholding improves Hessian-Affine, DoG, MSER (and possibly other) detectors and allows to use them on infrared and low contrast images.  ...  We have presented a new problem -- the wide multiple baseline stereo (WxBS) -- which considers matching of images that simultaneously differ in more than one image acquisition factor such as viewpoint,  ...  It can be seen as a large-scale image retrieval system which interactively tries to look for images based on sketches given by a user.  ... 
arXiv:1504.06603v2 fatcat:tanqozegq5gsxci72ysodbfmdm

Self-supervised Audiovisual Representation Learning for Remote Sensing Data [article]

Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu
2021 arXiv   pre-print
Using this dataset, we then pre-train ResNet models to map samples from both modalities into a common embedding space, which encourages the models to understand key properties of a scene that influence  ...  Many current deep learning approaches make extensive use of backbone networks pre-trained on large datasets like ImageNet, which are then fine-tuned to perform a certain task.  ...  For the training images, we first cropped the central half of the image to ensure that the augmented scenes do not deviate too far from the true location.  ... 
arXiv:2108.00688v1 fatcat:bhvcwavkibhxfmezayic5yryfe

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation.  ...  Using video data allows to extract three modalities from a single source of data [170, 184] , but at the same time poses an additional challenge of coherently extracting and segregating different modalities  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

Active Domain-Invariant Self-Localization Using Ego-Centric and World-Centric Maps [article]

Kanya Kurauchi, Kanji Tanaka, Ryogo Yamamoto, Mitsuki Yoshida
2022 arXiv   pre-print
A standard VPR subsystem based on a convolutional neural network (CNN) is assumed to be available, and its domain-invariant state recognition ability is proposed to be transferred to train the domain-invariant  ...  The ILC is available within the middle layers of the CNN model as a high-level description of the visual content (e.g., a saliency image) with respect to the ego-centric view.  ...  Approach Our goal is to extend a typical passive single-view VPR task to active multiview VPR.  ... 
arXiv:2204.10497v2 fatcat:dzray4mkh5b65gbm6w3ekza2pi

Video Content Modeling [chapter]

Tong Zhang, C.-C. Jay Kuo
2001 Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing  
Starting with a review of techniques to model raw video data, we study approaches used to describe physical objects, and conclude with a review on high-level semantic modeling of data with focus on the  ...  multimodal analysis.  ...  Here instead of a time segment to be associated with a description, a set of time segments is associated with a description -an approach that allows handling with a single object all occurrences of an  ... 
doi:10.1007/978-1-4757-3339-6_2 fatcat:2cyk63bmcrgftd2hffsqsqhvai

Background Subtraction for Automated Multisensor Surveillance: A Comprehensive Review

Marco Cristani, Michela Farenzena, Domenico Bloisi, Vittorio Murino
2010 EURASIP Journal on Advances in Signal Processing  
subtraction is a widely used operation in the video surveillance, aimed at separating the expected scene (the background) from the unexpected entities (the foreground).  ...  All the reviewed methods are organized in a novel taxonomy that encapsulates all the brand-new approaches in a seamless way.  ...  a plan passing overhead.  ... 
doi:10.1155/2010/343057 fatcat:yyz6764u3nf3jcoik3bmgeltlm

Dynamic Data Driven Applications System Concept for Information Fusion

Erik Blasch, Guna Seetharaman, Kitt Reinhardt
2013 Procedia Computer Science  
Acknowledgements The authors appreciate the discussions and motivation of advancements from DDDAS towards information fusion solutions from Dr. Frederica Darema.  ...  The simplest manifest is made of two time-separated images of a dynamic scene, or two stereo images of a static scene observed from distinct relative-directions.  ...  For mathematical, simulation, and software applications, there is a focus on multiscale multimodal approaches.  ... 
doi:10.1016/j.procs.2013.05.369 fatcat:nvim3wydxjdxxdhlrkofhtr2la

Recent Advance in Content-based Image Retrieval: A Literature Survey [article]

Wengang Zhou, Houqiang Li, Qi Tian
2017 arXiv   pre-print
Such a problem is challenging due to the intention gap and the semantic gap problems. Numerous techniques have been developed for content-based image retrieval in the last decade.  ...  Content-based image retrieval (CBIR), which makes use of the representation of visual content to identify relevant images, has attracted sustained attention in recent two decades.  ...  to address the transformation invariance to scale and rotation with the price of high memory overhead to maintain the Hough histograms.  ... 
arXiv:1706.06064v2 fatcat:m52xwsw5pzfzdbxo5o6dye2gde

Multimodal Deep Learning for Robust RGB-D Object Recognition [article]

Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard
2015 arXiv   pre-print
The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns.  ...  Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications.  ...  Then, we apply a jet colormap on the given image that transforms the input from a single to a three channel image (colorizing the depth).  ... 
arXiv:1507.06821v2 fatcat:b7ncjq3sn5edzfwwwan6syigby

Multimodal deep learning for robust RGB-D object recognition

Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard
2015 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns.  ...  Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications.  ...  Then, we apply a jet colormap on the given image that transforms the input from a single to a three channel image (colorizing the depth).  ... 
doi:10.1109/iros.2015.7353446 dblp:conf/iros/EitelSSRB15 fatcat:3qndfvy3nbavlextffsifrdu7a

Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms

J. Candamo, M. Shreve, D.B. Goldgof, D.B. Sapper, R. Kasturi
2010 IEEE transactions on intelligent transportation systems (Print)  
The main goal of this survey is to provide researchers in the field with a summary of progress achieved to date and to help identify areas where further research is needed.  ...  Visual surveillance is an active research topic in image processing.  ...  ACKNOWLEDGMENT The authors would like to thank D. Kelsey (Hart) and S. Godavarthy and W. Cheng (University of South Florida) for their involvement and support in the completion of this paper.  ... 
doi:10.1109/tits.2009.2030963 fatcat:tiajxro6sbc23p2rdaam2pwyna

Automatic Association of Chats and Video Tracks for Activity Learning and Recognition in Aerial Video Surveillance

Riad Hammoud, Cem Sahin, Erik Blasch, Bradley Rhodes, Tao Wang
2014 Sensors  
VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection  ...  of interest (TOIs) by movement type and geolocation; and (3) a user interface to support streaming multi-intelligence data processing.  ...  Acknowledgments This work was supported under contract number FA8750-13-C-0099 from the Air Force Research laboratory.  ... 
doi:10.3390/s141019843 pmid:25340453 pmcid:PMC4239870 fatcat:ony3ylej4nhzxbnap2zide3kwi

Attention in Multimodal Neural Networks for Person Re-identification

Aske R. Lejbolle, Benjamin Krogh, Kamal Nasrollahi, Thomas B. Moeslund
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Besides devising novel feature descriptors, the setup can be changed to capture persons from an overhead viewpoint rather than a horizontal.  ...  Correctly deciding on a true match by comparing images of a person, captured by several cameras, requires extraction of discriminative features to counter challenges such as changes in lighting, viewpoint  ...  Figure 1 . 1 Examples of images captured from (a): an overhead viewpoint [7] and (b): a horizontal viewpoint [6] . Figure 2 . 2 Overview of the Multimodal ATtention network (MAT).  ... 
doi:10.1109/cvprw.2018.00055 dblp:conf/cvpr/LejbolleKNM18 fatcat:lkxrlxkyojfe5pjurypzcj7b5e

Learning geodesic-aware local features from RGB-D images

Guilherme Potje, Renato Martins, Felipe Cadar, Erickson R. Nascimento
2022 Computer Vision and Image Understanding  
In this paper, we take one step further by proposing a new approach to compute descriptors from RGB-D images (where RGB refers to the pixel color brightness and D stands for depth information) that are  ...  , as well as in object retrieval and non-rigid surface tracking experiments, with comparable processing times.  ...  Acknowledgments The authors would like to thank CAPES (#88881.120236/2016-01), CNPq, and FAPEMIG for funding different parts of this work. R.  ... 
doi:10.1016/j.cviu.2022.103409 fatcat:mvlfe2p45zfgfkgqrdlrdnhkxa
« Previous Showing results 1 — 15 out of 501 results