34,242 Hits in 8.4 sec

Learning to discover and localize visual objects with open vocabulary [article]

Keren Ye, Mingda Zhang, Wei Li, Danfeng Qin, Adriana Kovashka, Jesse Berent
2018 arXiv   pre-print
Thus, we can detect objects beyond a fixed object category vocabulary, if those objects are frequent and distinctive enough.  ...  In this work we learn association maps between images and captions.  ...  Introduction Learning to localize and classify visual objects is a fundamental problem in computer vision.  ... 
arXiv:1811.10080v1 fatcat:3mba2dirgjezxmpajtkwplpjvy

Open-Vocabulary Object Detection Using Captions [article]

Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang
2021 arXiv   pre-print
Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised  ...  In this paper, we put forth a novel formulation of the object detection problem, namely open-vocabulary object detection, which is more general, more practical, and more effective than weakly supervised  ...  [35] and Ye et al. [44] aim to discover an open set of object classes from image-caption corpora, and learn detectors for each discovered class.  ... 
arXiv:2011.10678v2 fatcat:ven4oegqnrdilb4reguistgxnm

Learning Everything about Anything: Webly-Supervised Visual Concept Learning

Santosh K. Divvala, Ali Farhadi, Carlos Guestrin
2014 2014 IEEE Conference on Computer Vision and Pattern Recognition  
Our approach leverages vast resources of online books to discover the vocabulary of variance, and intertwines the data collection and modeling steps to alleviate the need for explicit human supervision  ...  To date, our system has models available for over 50,000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes.  ...  To model the visual variance, we propose to intertwine the vocabulary discovery and the model learning steps.  ... 
doi:10.1109/cvpr.2014.412 dblp:conf/cvpr/DivvalaFG14 fatcat:py5e4bza6ndyfnkcxrd5357cfu

Localized Vision-Language Matching for Open-vocabulary Object Detection [article]

Maria A. Bravo, Sudhanshu Mittal, Thomas Brox
2022 arXiv   pre-print
In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes.  ...  It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes  ...  They refer to this problem as Open-vocabulary Object Detection. There are two major challenges to this problem: First, image-caption pairs themselves are too weak to learn localized object-regions.  ... 
arXiv:2205.06160v2 fatcat:apmon75v6jf5nff3o24d5ivimi

Web Multimedia Object Classification Using Cross-Domain Correlation Knowledge

Wenting Lu, Jingxuan Li, Tao Li, Weidong Guo, Honggang Zhang, Jun Guo
2013 IEEE transactions on multimedia  
To mine more meaningful correlation knowledge, instead of using commonly used visual words in the traditional bag-of-visual-words (BoW) model, we discover higher level visual components (words and phrases  ...  Here, the knowledge is extracted from unlabeled objects through unsupervised learning and applied to perform supervised classification tasks.  ...  In this paper, we group local feature keypoints with the unsupervised agglomerative hierarchical clustering algorithm [27] to generate the vocabulary tree in which each visual word acts as a node.  ... 
doi:10.1109/tmm.2013.2280895 fatcat:ppgs7kapebby3b5vsdw7zevdou

Automatic Attribute Discovery and Characterization from Noisy Web Data [chapter]

Tamara L. Berg, Alexander C. Berg, Jonathan Shih
2010 Lecture Notes in Computer Science  
techniques for identifying attribute vocabularies and for learning to recognize attributes without hand labeled training data.  ...  It is common to use domain specific terminology -attributes -to describe the visual appearance of objects.  ...  attribute vocabularies and for learning to recognize these attributes without hand labeled data.  ... 
doi:10.1007/978-3-642-15549-9_48 fatcat:6k34im5mlfa7lb5p7usfght4dy

Scene Classification Via pLSA [chapter]

Anna Bosch, Andrew Zisserman, Xavier Muñoz
2006 Lecture Notes in Computer Science  
Given a set of images of scenes containing multiple object categories (e.g. grass, roads, buildings) our objective is to discover these objects in each image in an unsupervised manner, and to use this  ...  We investigate the classification performance under changes in the visual vocabulary and number of latent topics learnt, and develop a novel vocabulary using colour SIFT descriptors.  ...  Acknowledgements Thanks to A.Torralba, J.Vogel and F.F.Li for providing their datasets and to Josef Sivic for discussions.  ... 
doi:10.1007/11744085_40 fatcat:iyk2uone3netfdq3h3eyatgtny

Interactively building a discriminative vocabulary of nameable attributes

Devi Parikh, Kristen Grauman
2011 CVPR 2011  
To ensure a compact vocabulary and efficient use of annotators' effort, we 1) show how to actively augment the vocabulary such that new attributes resolve inter-class confusions, and 2) propose a novel  ...  Human-nameable visual attributes offer many advantages when used as mid-level features for object recognition, but existing techniques to gather relevant attributes can be inefficient (costing substantial  ...  Acknowledgements: This research is supported in part by the Luce Foundation and NSF IIS-1065390.  ... 
doi:10.1109/cvpr.2011.5995451 dblp:conf/cvpr/ParikhG11 fatcat:nyxcokgayrbz7fhdihjqidhmtq

Visual pattern discovery in image and video data: a brief survey

Hongxing Wang, Gangqiang Zhao, Junsong Yuan
2013 Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery  
In image and video data, visual pattern refers to re-occurring composition of visual primitives. Such visual patterns extract the essence of the image and video data that convey rich information.  ...  At the end we identify the open issues for future research.  ...  Zhu et al. 73 use saliency-guided multiple class learning to discover object patterns and perform object categorization.  ... 
doi:10.1002/widm.1110 fatcat:skjnmv5njfdtxc3erl4r2txqri

Building an Enhanced Vocabulary of the Robot Environment with a Ceiling Pointing Camera

Alejandro Rituerto, Henrik Andreasson, Ana Murillo, Achim Lilienthal, José Guerrero
2016 Sensors  
We show different robotic tasks that could benefit of the use of our visual vocabulary approach, such as place recognition or object discovery.  ...  This pipeline incorporates (1) tracking information to the process of vocabulary construction and (2) geometric cues to the appearance descriptors.  ...  Traditional visual vocabulary learning and weighting is performed independently, and authors of [30] present Joint-ViVo, a method where words and their weights are learned jointly.  ... 
doi:10.3390/s16040493 pmid:27070607 pmcid:PMC4851007 fatcat:svbsehelw5fdthws5uwgdedxse

Scene Recognition by Combining Local and Global Image Descriptors [article]

Jobin Wilson, Muhammad Arif
2017 arXiv   pre-print
We utilize DAISY features associated with key points within images as our local feature descriptor and histogram of oriented gradients (HOG) corresponding to an entire image as a global descriptor.  ...  We make use of a bag-of-visual-words encoding and apply Mini- Batch K-Means algorithm to reduce the complexity of our feature encoding scheme.  ...  Once the execution pipeline described in Section II was built, the optimal size of visual vocabulary (K) was emperically discovered, by running the pipeline 3 times corresponding to each K and averaging  ... 
arXiv:1702.06850v1 fatcat:ysjwpee5ezbe3hxfx3hr2zfcem

Expanded bag of words representation for object classification

Tinglin Liu, Jing Liu, Qinshan Liu, Hanqing Lu
2009 2009 16th IEEE International Conference on Image Processing (ICIP)  
In this paper, we first design a simple method to discover this dependency through computing the spatial correlation between visual words in overlapped local patches.  ...  Currently, the bag of visual words (BOW) representation has received wide applications in object categorization.  ...  high-dimensional space of image local features, such as SIFT [1] , to form a finite clusters, which are described as vocabulary of 'visual words'.  ... 
doi:10.1109/icip.2009.5413588 dblp:conf/icip/LiuLLL09 fatcat:m665zzg4lzhkzfhf3nhuydegxq

Salient region detection and segmentation for general object recognition and image understanding

TieJun Huang, YongHong Tian, Jia Li, HaoNan Yu
2011 Science China Information Sciences  
By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image.  ...  The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner.  ...  The key idea of GORIUM is to discover recurring visual objects in multiple images by selective attention modeling and pairwise local invariant features matching, and then construct a visual dictionary  ... 
doi:10.1007/s11432-011-4487-1 fatcat:nmbyczekrjdwbpnjzix2v4p4ti

Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning [article]

Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang
2022 arXiv   pre-print
datasets and hindering the model to learn general representations to achieve open-vocabulary point-cloud detection.  ...  Moreover, it is extremely laborious and expensive to collect and fully annotate a point-cloud detection dataset with numerous classes of objects, leading to the limited classes of existing point-cloud  ...  By discovering the strong generalizability of localization in 3D object detection, we are also the first to open-up another path by resorting to ImageNet1K to help open-vocabulary 3D detection. 2) We propose  ... 
arXiv:2207.01987v1 fatcat:xkhnb7e3anag3bsmgx2peo3qu4

Discovering Attribute Shades of Meaning with the Crowd

Adriana Kovashka, Kristen Grauman
2015 International Journal of Computer Vision  
They are widely applicable to tasks that involve describing visual content, such as zero-shot category learning and organization of photo collections.  ...  To learn semantic attributes, existing methods typically train one discriminative model for each word in a vocabulary of nameable properties.  ...  Acknowledgements We thank the anonymous reviewers for their helpful feedback and suggestions. This research is supported in part by ONR ATL N00014-11-1-0105.  ... 
doi:10.1007/s11263-014-0798-1 fatcat:ett2ctsqifa2flwqa4bwpy5nwu
« Previous Showing results 1 — 15 out of 34,242 results