Filters








10,978 Hits in 4.0 sec

Decentralized Unsupervised Learning of Visual Representations [article]

Yawen Wu, Zhepeng Wang, Dewen Zeng, Meng Li, Yiyu Shi, Jingtong Hu
2022 arXiv   pre-print
Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled image data.  ...  is learned for better data representations.  ...  We propose a framework with two approaches to learning visual representations from unlabeled data on distributed clients.  ... 
arXiv:2111.10763v2 fatcat:tqayp2fl55aylgvfmatgl5tvyi

Self-Supervised Consistent Quantization for Fully Unsupervised Image Retrieval [article]

Guile Wu, Chao Zhang, Stephan Liwicki
2022 arXiv   pre-print
To minimize human supervision, recent advance proposes deep fully unsupervised image retrieval aiming at training a deep model from scratch to jointly optimize visual features and quantization codes.  ...  Unsupervised image retrieval aims to learn an efficient retrieval system without expensive data annotations, but most existing methods rely heavily on handcrafted feature descriptors or pre-trained feature  ...  learn visual features and codes.  ... 
arXiv:2206.09806v1 fatcat:ha4hj3oxcvfvrbbhbdsbawjhhu

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction [article]

Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu
2022 arXiv   pre-print
Then, those auditory features are fused into the visual features through a cross-modal self-attention module.  ...  The other domain discrimination branch is devised to reduce the domain discrepancy of visual features and audio-visual correlations implied by the fused audio-visual features.  ...  The other domain discriminator is adopted to learn fused audio-visual features having a uniform distribution across source and target domains.  ... 
arXiv:2208.05220v1 fatcat:fzvtgu3apjdj3edfjgzje6tury

Unsupervised Video Summarization via Multi-source Features [article]

Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth
2021 arXiv   pre-print
Therefore, we propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.  ...  The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization capability and generalize to a wider range of domains.  ...  fused features.  ... 
arXiv:2105.12532v1 fatcat:xvwhtzewqzerppiwfmgyiczpua

See the Sound, Hear the Pixels

Janani Ramaswamy, Sukhendu Das
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
A novel Audio Visual Triplet Gram Matrix Loss (AVTGML) has been proposed as a loss function to learn the localization in an unsupervised way.  ...  For every event occurring in the real world, most often a sound is associated with the corresponding visual scene.  ...  In case of unsupervised learning, the feature representations from SWABs are used in the proposed Audio Visual Triplet Gram Matrix Loss (AVTGML) function to get the attention maps.  ... 
doi:10.1109/wacv45572.2020.9093616 dblp:conf/wacv/RamaswamyD20 fatcat:glo5yaf52ff5diqun2rrwtusp4

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration

Haojie Liu, Han Shen, Lichao Huang, Ming Lu, Tong Chen, Zhan Ma
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors  ...  Efficient temporal information representation plays a key role in video coding.  ...  An one-stage unsupervised flow learning is applied with implicit flow representation using quantized features.  ... 
doi:10.1609/aaai.v34i07.6825 fatcat:naduixdarnfy3ebtcw55ht2h5e

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration [article]

Haojie Liu, Han shen, Lichao Huang, Ming Lu, Tong Chen, Zhan Ma
2019 arXiv   pre-print
We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors  ...  Efficient temporal information representation plays a key role in video coding.  ...  An one-stage unsupervised flow learning is applied with implicit flow representation using quantized features.  ... 
arXiv:1912.06348v1 fatcat:h6chbcl52nbwtbpx6hrrzj7fme

A One-Shot Texture-Perceiving Generative Adversarial Network for Unsupervised Surface Inspection [article]

Lingyun Gu, Lin Zhang, Zhaokui Wang
2021 arXiv   pre-print
To combat it, we propose a hierarchical texture-perceiving generative adversarial network (HTP-GAN) that is learned from the one-shot normal image in an unsupervised scheme.  ...  Visual surface inspection is a challenging task owing to the highly diverse appearance of target surfaces and defective regions.  ...  Then the network is able to learn internal distributions at different scales.  ... 
arXiv:2106.06792v1 fatcat:om4csq5h3bbxjf62kzq6tbreva

Learning Like a Toddler

Emre Yilmaz, Konstantinos Rematas, Tinne Tuytelaars, Hugo Van hamme
2014 Proceedings of the ACM International Conference on Multimedia - MM '14  
This paper presents the initial findings of our efforts to build an unsupervised multimodal vocabulary learning scheme in a realistic scenario.  ...  Moreover, we have performed experiments using different visual representations and time spans for combining the audiovisual information.  ...  The learned visual representation for shot-level DP features match the reference DP features for 84% of the video objects.  ... 
doi:10.1145/2647868.2655036 dblp:conf/mm/YilmazRTh14 fatcat:wtvyub5fjjfy5eamikxxn3d3au

Unsupervised Low-Rank Representations for Speech Emotion Recognition

Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos
2019 Interspeech 2019  
Visualization of features in two dimensions provides insight into discriminatory abilities of reduced feature sets.  ...  Classification with low-dimensional representations yields performance improvement in a variety of settings.  ...  Interestingly, this unsupervised distribution is quite similar to the Valence-Arousal affective representation.  ... 
doi:10.21437/interspeech.2019-2769 dblp:conf/interspeech/Paraskevopoulos19 fatcat:iqnflmjb45g2zczeoh6buvgz54

Learning deep representation of multityped objects and tasks [article]

Truyen Tran, Dinh Phung, Svetha Venkatesh
2016 arXiv   pre-print
Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation.  ...  For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued).  ...  For baseline (normalized feature concatenation), the MAP improves by 29% from 0.272 with BOW representation to 0.351 with all visual features combined.  ... 
arXiv:1603.01359v1 fatcat:wevv6e6qzvdgxp7ado6yex7iym

Unsupervised Learning Based on Multiple Descriptors for WSIs Diagnosis

Taimoor Shakeel Sheikh, Jee-Yeon Kim, Jaesool Shim, Migyung Cho
2022 Diagnostics  
and local binary patterns along with the original image to fuse the heterogeneous features.  ...  The pre-trained latent vectors are extracted from each autoencoder, and these fused feature representations are utilized for classification.  ...  Given any learned latent representations, y(x), we can extract the multiple feature representations to retrain our classification model with fused latent representations connected to the fully connected  ... 
doi:10.3390/diagnostics12061480 pmid:35741289 pmcid:PMC9222016 fatcat:iqrvtuzszbcazk5ou3b5bf7hdy

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion [article]

Yang Wang
2020 arXiv   pre-print
Such fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects.  ...  Most of the existing state-of-the-art focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal.  ...  The proposed method projected latent variables fused with multi-view features into multiple observations.  ... 
arXiv:2006.08159v1 fatcat:g4467zmutndglmy35n3eyfwxku

Deep multimodal representation learning: a survey

Wenzhong Guo, Jianwen Wang, Shiping Wanga
2019 IEEE Access  
Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years.  ...  INDEX TERMS Multimodal representation learning, multimodal deep learning, deep multimodal fusion, multimodal translation, multimodal adversarial learning.  ...  Unsupervised learning has been widely used for dimensionality reduction and feature extraction on unlabeled datasets.  ... 
doi:10.1109/access.2019.2916887 fatcat:ms4wcgl5rncsbiywz27uss4ysq

Representation Learning for Remote Sensing: An Unsupervised Sensor Fusion Approach [article]

Aidan M. Swope, Xander H. Rudelis, Kyle T. Story
2021 arXiv   pre-print
These representations outperform fully supervised ImageNet weights on a remote sensing classification task and improve as more sensors are fused.  ...  In addition, most remote sensing applications currently use only a small subset of the multi-sensor, multi-channel information available, motivating the need for fused multi-sensor representations.  ...  Figure 1 : Learned representations of out-of-sample image scenes, visualized with PCA followed by t-SNE and colored by OpenStreetMap category.  ... 
arXiv:2108.05094v1 fatcat:nah7ak6ehvaxzatv3vdcggfyj4
« Previous Showing results 1 — 15 out of 10,978 results