Filters








374 Hits in 10.0 sec

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods [article]

Alexander Freytag, Johannes Rühle, Paul Bodesheim, Erik Rodner, and Joachim Denzler
2014 arXiv   pre-print
Vector-quantized local features frequently used in bag-of-visual-words approaches are the backbone of popular visual recognition systems due to both their simplicity and their performance.  ...  The question remains how much visual information is "lost in quantization" when mapping visual features to code words?  ...  Unbagging bag-of-visual words: visualizing quantization effects Our technique is simple and in line with current trends for image reconstruction from local features [9, 16, 8] .  ... 
arXiv:1408.4692v1 fatcat:kujhplne6zhodgtf4q7pg7naxu

Clustering of multiple-event online sound collections with the codebook approach

Lluis Surós, Xavier Favory
2019 Zenodo  
These "acoustic words" can be understood and processed analogous to natural language words, thus giving access to the use of varied Natural Language Processing techniques, such as Bag-of-Words, TF-IDF  ...  This multiple-event audio clips might be misrepresented by statistical aggregation methods such as computing the mean over the features; in this regard, techniques that retain the elemental blocks of a  ...  Image from https://gurus.pyimagesearch. com/the-bag-of-visual-words-model Bag-of-acoustic-words Following the Bag-of-Feature model, vector quantization of the acoustic space can be performed, and a codebook  ... 
doi:10.5281/zenodo.3475480 fatcat:zbpmafthb5ef7i2r43pehpc4lm

Exploring features in a Bayesian framework for material recognition

Ce Liu, Lavanya Sharan, Edward H. Adelson, Ruth Rosenholtz
2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
Unlike other visual recognition tasks in computer vision, it is difficult to find good, reliable features that can tell material categories apart.  ...  We are interested in identifying the material category, e.g. glass, metal, fabric, plastic or wood, from a single image of a surface.  ...  After quantizing these features into dictionaries, we convert an image into a bag of words and use latent Dirichlet allocation (LDA) [3] to model the distribution of the words.  ... 
doi:10.1109/cvpr.2010.5540207 dblp:conf/cvpr/LiuSAR10 fatcat:dmhi55krjzh6ddt3zzo3x4bfnq

Recognising complex activities with histograms of relative tracklets

Sebastian Stein, Stephen J. McKenna
2017 Computer Vision and Image Understanding  
One approach to the recognition of complex human activities is to use feature descriptors that encode visual interactions by describing properties of local visual features with respect to trajectories  ...  Our comparative evaluation of features from accelerometers and video highlighted a performance gap between visual and accelerometer-based motion features and showed a substantial performance gain when  ...  Acknowledgements The authors would like to thank Jianguo Zhang and Ruixuan Wang for valuable feedback on drafts of this paper. This research was funded by RCUK grants EP/G0 6 6019/1 and EP/K037293/1 .  ... 
doi:10.1016/j.cviu.2016.08.012 fatcat:bd43ecgf2jfxhlalsixz74seum

Mobile Visual Location Recognition

Georg Schroth, Robert Huitl, David Chen, Mohammad Abu-Alqumsan, Anas Al-Nuaimi, Eckehard Steinbach
2011 IEEE Signal Processing Magazine  
As the spatial layout of features within query and database image is ignored in the matching process, this approach is called Bag-of-Visual-Words or Bag-of-Features (BoF).  ...  Retrieval based Location Recognition . . . . . . . . . . 12 2.2.1 Feature Extraction . . 14 2.2.2 Bag-of-Features based Image Retrieval . . . . . . . . . . . . . . . . . . 17 2.2.3 Visual Word Quantization  ... 
doi:10.1109/msp.2011.940882 fatcat:7vzjij4qvjb5hcc2znqp7ginuy

Image search—from thousands to billions in 20 years

Lei Zhang, Yong Rui
2013 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
Starting with a retrospective review of three stages of image search in the history, the article highlights major breakthroughs around the year 2000 in image search features, indexing methods, and commercial  ...  Based on the review, the concluding section discusses open research challenges and suggests future research directions in effective visual representation, image knowledge base construction, implicit user  ...  ACKNOWLEDGMENTS The authors gratefully acknowledge Wei-Ying Ma for his visionary long-term support and encouragement, and Xin-Jing Wang, Changhu Wang, Xirong Li, and Zhiwei Li for their years of collaboration  ... 
doi:10.1145/2490823 fatcat:cor23f3c7nb7fimy4ixp32bdk4

Visually Grounded Models of Spoken Language: A Survey of Datasets, Architectures and Evaluation Techniques

Grzegorz Chrupała
2022 The Journal of Artificial Intelligence Research  
This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years.  ...  The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas.  ...  via vector quantization in Harwath, Hsu, and Glass (2020) .  ... 
doi:10.1613/jair.1.12967 fatcat:zib2mr5wkjdmteyrgac6gxekli

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques [article]

Grzegorz Chrupała
2021 arXiv   pre-print
This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years.  ...  The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas.  ...  In the following section we will see the effects of these developments on the research into visually grounded models of spoken language.  ... 
arXiv:2104.13225v3 fatcat:edodewkhljbqtpcrm2knd2zw7i

Visual Object Recognition

Kristen Grauman, Bastian Leibe
2011 Synthesis Lectures on Artificial Intelligence and Machine Learning  
; visual vocabularies and bags-of-words; methods to verify geometric consistency according to parameterized geometric transformations; dealing with outliers in correspondences, RANSAC and the Generalized  ...  The target audience consists of researchers or students working in AI, robotics, or vision who would like to understand what methods and representations are available for these problems.  ...  Bastian Leibe's work on the project was supported in part by the UMIC cluster of excellence (DFG EXC 89).  ... 
doi:10.2200/s00332ed1v01y201103aim011 fatcat:fhz7aokkfjav7fuauuorfstq4y

Sound representation and classification benchmark for domestic robots

Janvier Maxime, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
2014 2014 IEEE International Conference on Robotics and Automation (ICRA)  
We address the problem of sound representation and classification and present results of a comparative study in the context of a domestic robotic scenario.  ...  A dataset of sounds was recorded in realistic conditions (background noise, presence of several sound sources, reverberations, etc.) using the humanoid robot NAO.  ...  of the mean and standard deviation can also be used. • The bag-of-words (BoW) approach.  ... 
doi:10.1109/icra.2014.6907786 dblp:conf/icra/JanvierAGH14 fatcat:4kvhhwh65relnnnxiu57vyr2y4

Sound Representation and Classification Benchmark for Domestic Robots [article]

Maxime Janvier, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
2014 arXiv   pre-print
We address the problem of sound representation and classification and present results of a comparative study in the context of a domestic robotic scenario.  ...  A dataset of sounds was recorded in realistic conditions (background noise, presence of several sound sources, reverberations, etc.) using the humanoid robot NAO.  ...  of the mean and standard deviation can also be used. • The bag-of-words (BoW) approach.  ... 
arXiv:1402.3689v1 fatcat:ljzuajgbpncblev4eoqqijc25a

Instance search retrospective with focus on TRECVID

George Awad, Wessel Kraaij, Paul Over, Shin'ichi Satoh
2017 International Journal of Multimedia Information Retrieval  
The Instance Search (INS) benchmark worked with a variety of large collections of data including Sound & Vision, Flickr, BBC (British Broadcasting Corporation) Rushes for the first 3 pilot years and with  ...  The main contributions of the paper include i) an examination of the evolving design of the evaluation framework and its components (system tasks, data, measures); ii) an analysis of the influence of topic  ...  As described, the representation based on bag of visual words with very fine quantization is known to be effective for instance search.  ... 
doi:10.1007/s13735-017-0121-3 pmid:28758054 pmcid:PMC5531298 fatcat:3khp2cscmbhohipfx246gspqlq

A Comprehensive Survey on Computational Aesthetic Evaluation of Visual Art Images: Metrics and Challenges

Jiajing Zhang, Yongwei Miao, Jinhui Yu
2021 IEEE Access  
Quantizing local descriptors As is often the case with bags-of-features for generic object recognition and image retrieval, we also quantize local descriptors by using visual words in codebooks.  ...  bag-of-visual-words classification frame- work, which extracted LAB-based color visual words and SIFT-based texture visual words [40] , the influence of color combination on emotion [72] , visual and  ...  For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited.  ... 
doi:10.1109/access.2021.3083075 fatcat:zukn4uhlinejjdubdezsdghh5i

3D HMM-based Facial Expression Recognition using Histogram of Oriented Optical Flow

Sheng H Kung, Mohamed A. Zohdy, Djamel Bouchaffra
2015 Transactions on Machine Learning and Artificial Intelligence  
Histogram of Optical Flow is used as the descriptor for extracting and describing the key features, while training and testing are performed on 3D Hidden Markov Models.  ...  This research is focused on HCI in the recognition of human facial expression and emotion analysis.  ...  ACKNOWLEDGEMENT We thank Jasser Jasser of ECE Department of Oakland University for assistance with development and running of the experiment and sharing of his pearls of wisdom with us during the course  ... 
doi:10.14738/tmlai.36.1661 fatcat:2xe2jcwa2ffy7buu5ywh6jvq6i

Visually Fingerprinting Humans without Face Recognition

He Wang, Xuan Bao, Romit Roy Choudhury, Srihari Nelakuditi
2015 Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services - MobiSys '15  
One application of visual fingerprints relates to augmented reality, in which an individual looks at other people through her camera-enabled glass (e.g., Google Glass) and views information about them.  ...  If Alice recognizes Bob through motion fingerprints, she can extract Bob's clothing features and update a database inside the InSight server.  ...  If successful, such a fingerprint could be effectively used towards human recognition or content announcement in the visual vicinity, and more broadly towards enabling human-centric augmented reality.  ... 
doi:10.1145/2742647.2742671 dblp:conf/mobisys/WangBCN15 fatcat:7kr2xjtvbfgcnk4zkw7c7k67vy
« Previous Showing results 1 — 15 out of 374 results