14,220 Hits in 4.5 sec

A unified framework for local visual descriptors evaluation

Olivier Kihl, David Picard, Philippe-Henri Gosselin
2015 Pattern Recognition  
Local descriptors are the ground layer of recognition feature based systems for still images and video. We propose a new framework to explain local descriptors.  ...  With this framework, we are able to explain most of the popular descriptors in the literature such as HOG, HOF, SURF.  ...  Conclusion In this paper, we introduced a new framework to describe local visual descriptors.  ... 
doi:10.1016/j.patcog.2014.11.013 fatcat:kabztllz4jefdp7i4sjuxeywja

Interactive Image Search for Clothing Recommendation

Zhengzhong Zhou, Yifei Xu, Jingjin Zhou, Liqing Zhang
2016 Proceedings of the 2016 ACM on Multimedia Conference - MM '16  
We propose the Hybrid Topic (HT) model, a probabilistic network integrating the multi-channel descriptors into a unified framework, to learn the intricate semantic representation of the descriptors above  ...  We employ the color, texture, shape and attributes as additional descriptors to further refine the requirements.  ...  For the second level abstraction, we present a probabilistic network to combine the multi-channel descriptors into a unified framework, i.e. Hybrid Topic (HT) model.  ... 
doi:10.1145/2964284.2973834 dblp:conf/mm/ZhouXZZ16 fatcat:htepi62vnncgvj3o2ujitfjk3u

Learning Instance Representation Banks for Aerial Scene Classification [article]

Jingjun Yi, Beichen Zhou
2022 arXiv   pre-print
This unified framework is not trivial as all the local semantic descriptors can be aligned to the same scene scheme, enhancing the scene representation capability.  ...  In this paper, we solve this problem by designing a novel representation set named instance representation bank (IRB), which unifies multiple local descriptors under the multiple instance learning (MIL  ...  Local semantic descriptor based representation visualization for aerial scenes based on attention modules [6, 5, 9], local max selection (LSM) [10, 2], context-aware class peak response (CACPR) [2] and  ... 
arXiv:2205.13744v1 fatcat:p7yrhoiwp5cinhwcd7vibuow5m

Optimum Pipeline for Visual Terrain Classification Using Improved Bag of Visual Words and Fusion Methods

Hang Wu, Baozhen Liu, Weihua Su, Zihao Chen, Wenchang Zhang, Xudong Ren, Jinggong Sun
2017 Journal of Sensors  
The bag of visual words (BOVW) framework has emerged as a promising approach and effective paradigm for visual terrain classification.  ...  We provide a comprehensive study of all steps in the BOVW framework and different fusion methods for visual terrain classification.  ...  Xiaojiang Peng for valuable discussion. This work is supported by the Science and Technology Pillar Program, Tianjin, China, under Project 16YFZCSF00590.  ... 
doi:10.1155/2017/8513949 fatcat:az2ekljrn5f5bfxiuyxpkhrm44

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness [chapter]

Xiaodong Yang, YingLi Tian
2014 Lecture Notes in Computer Science  
This paper presents a novel framework for human action recognition based on sparse coding.  ...  In order to incorporate the spatio-temporal information, we propose a novel approach of super location vector (SLV) to model the space-time locations of local interest points in a much more compact way  ...  Conclusion In this paper, we have presented a novel framework for action recognition.  ... 
doi:10.1007/978-3-319-10605-2_47 fatcat:icq7cm2q3ndornrgejg76k5dnu

A hybrid graph-based and non-linear late fusion approach for multimedia retrieval

Ilias Gialampoukidis, Anastasia Moumtzidou, Dimitris Liparas, Stefanos Vrochidis, Ioannis Kompatsiaris
2016 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)  
In contrast, we present a strategy for fusing textual and visual modalities, through the combination of a non-linear fusion model and a graph-based late fusion approach.  ...  An interesting challenge within the aforementioned task is the efficient combination of different modalities in a multimedia object and especially the fusion between textual and visual information.  ...  Feature Extraction The features, which are employed in the evaluation of the proposed hybrid multimedia retrieval framework, are listed as follows: Visual descriptors: The scale-invariant local descriptors  ... 
doi:10.1109/cbmi.2016.7500252 dblp:conf/cbmi/GialampoukidisM16 fatcat:gggu5spoufduxbycflomf5b7vu

Simultaneous Object Recognition and Localization in Image Collections

Shao-Chuan Wang, Yu-Chiang Frank Wang
2010 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance  
This papers presents a weakly supervised method to simultaneously address object localization and recognition problems.  ...  The selected visual words are used to construct visual attention maps, which provide descriptive information for each object category.  ...  In this paper, we propose a unified approach for simultaneous object categorization and localization using a weakly supervised framework.  ... 
doi:10.1109/avss.2010.47 dblp:conf/avss/WangW10 fatcat:7tllmbyojncsfiss4ux4gezhta

A unified visual graph-based approach to navigation for wheeled mobile robots

Jan Hartmann, Jan Helge Klussendorff, Erik Maehle
2013 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems  
In this paper, we will therefore introduce visual solutions to SLAM, localization, and path planning in a unified graph-based framework with the main target of wheeled robots in industrial applications  ...  Yet, while there have been significant efforts in the field of visual simultaneous localization and mapping (VSLAM), a complete navigation package that could rival popular laser-based solutions is not  ...  Locality Sensitive Hashing limits the search space for matching binary descriptors by comparing only those descriptors that have the same hash value for a given hash function.  ... 
doi:10.1109/iros.2013.6696610 dblp:conf/iros/HartmannKM13 fatcat:ofimxy56gfcldb4aklwdh4z3oi

Reuse your features: unifying retrieval and feature-metric alignment [article]

Javier Morlana, J.M.M. Montiel
2022 arXiv   pre-print
We propose a compact pipeline to unify all the steps of Visual Localization: image retrieval, candidate re-ranking and initial pose estimation, and camera pose refinement.  ...  DRAN is the first single network able to produce the features for the three steps of visual localization.  ...  Evaluation We selected Aachen Day-Night for evaluation, as it depicts the old inner city of Aachen, testing visual localization under day-night condition.  ... 
arXiv:2204.06292v1 fatcat:ux7w27zmivd7jfxzzdxhog4fg4

A Unified Framework for Retrieving Diverse Social Images

Maia Zaharieva, Patrick Schwab
2014 MediaEval Benchmarking Initiative for Multimedia Evaluation  
In this paper we explore the performance of a generic, unified framework for the retrieval of relevant and diverse images from social photo collections.  ...  The approach allows for the easy evaluation of different visual and textual image descriptions, clustering algorithms, and similarity metrics.  ...  While most of the presented approaches employ a combination of a re-ranking (for relevance improvement) and a clustering (for ensuring diversification) method, we build a unified framework that allows  ... 
dblp:conf/mediaeval/ZaharievaS14 fatcat:odzsgg7msvgidinqbn62afpjdu

Fast and Scalable Image Retrieval Using Predictive Clustering Trees [chapter]

Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, Sašo Džeroski
2013 Lecture Notes in Computer Science  
Each image is then represented by a histogram of the distribution of its local descriptors throughout the vocabulary.  ...  We evaluate the proposed method on a benchmark database of a million images and compare it to other state-of-the-art methods.  ...  First, we randomly select a subset of the local (SIFT) descriptors from all of the images. Next, the selected local descriptors constitute the training set used to construct a PCT.  ... 
doi:10.1007/978-3-642-40897-7_3 fatcat:q6seu2otjbbtbn7d74fmvl45wu

Cross-modal visuo-tactile object recognition using robotic active exploration

Pietro Falco, Shuang Lu, Andrea Cirillo, Ciro Natale, Salvatore Pirozzi, Dongheui Lee
2017 2017 IEEE International Conference on Robotics and Automation (ICRA)  
The proposed cross-modal framework is constituted by three main elements. The first is a unified representation of visual and tactile data, which is suitable for cross-modal perception.  ...  The second is a set of features able to encode the chosen representation for classification applications. The third is a supervised learning algorithm, which takes advantage of the chosen descriptor.  ...  Classification results In order to show the performance of the framework, we evaluate, in terms of accuracy, the proposed combination of (1) unified representation, (2) unified descriptor, and (3) suitable  ... 
doi:10.1109/icra.2017.7989619 dblp:conf/icra/FalcoLCNPL17 fatcat:yvednjlbtbhjdnragb4xkvbf4u

Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors

Ionut Mironica, Ionut Duta, Bogdan Ionescu, Nicu Sebe
2015 2015 IEEE International Conference on Multimedia and Expo (ICME)  
In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD).  ...  We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-ofthe-art results in several classification scenarios.  ...  In this context, we propose a new way of representing the descriptor information that exploits the advantages of both, FK and VLAD representations, in a unified framework.  ... 
doi:10.1109/icme.2015.7177489 dblp:conf/icmcs/MironicaDIS15 fatcat:nx4gwzbizram5nbe5k663ggzym

View and Style-Independent Action Manifolds for Human Activity Recognition [chapter]

Michał Lewandowski, Dimitrios Makris, Jean-Christophe Nebel
2010 Lecture Notes in Computer Science  
The proposed framework is evaluated on a real and challenging dataset (IXMAS), which is composed of a variety of actions seen from arbitrary viewpoints.  ...  We introduce a novel approach to automatically learn intuitive and compact descriptors of human body motions for activity recognition.  ...  The authors would like to thank Lena Gorelick from University of Western Ontario and Richard Souvenir from University of North Carolina at Charlotte for sharing their codes.  ... 
doi:10.1007/978-3-642-15567-3_40 fatcat:xrhzykgjdbhblgj7vfbyqmjvtu

ContextDesc: Local Descriptor Augmentation with Cross-Modality Context [article]

Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan
2019 arXiv   pre-print
Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii  ...  In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors.  ...  histogram cessed and aggregated in a unified framework.  ... 
arXiv:1904.04084v1 fatcat:gd2pe42w75bbjazhw33z4pwiwe
« Previous Showing results 1 — 15 out of 14,220 results