A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Semantic Autoencoder for Zero-Shot Learning
[article]
2017
arXiv
pre-print
Semantic Autoencoder for Zero-Shot Learning
Elyor Kodirov ...
Kodirov, T. Xiang, Z. Fu, and S. Gong. Unsupervised
caltech-ucsd birds-200-2011 dataset. In California Institute domain adaptation for zero-shot learning. ...
arXiv:1704.08345v1
fatcat:3tlzperd2nc3vmdmmigqavdonm
Instance Cross Entropy for Deep Metric Learning
[article]
2019
arXiv
pre-print
Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its
arXiv:1911.09976v1
fatcat:ae3enq5rxzfa7dmxustmbmsa5a
more »
... th one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples' gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.
ID-aware Quality for Set-based Person Re-identification
[article]
2019
arXiv
pre-print
Set-based person re-identification (SReID) is a matching problem that aims to verify whether two sets are of the same identity (ID). Existing SReID models typically generate a feature representation per image and aggregate them to represent the set as a single embedding. However, they can easily be perturbed by noises--perceptually/semantically low quality images--which are inevitable due to imperfect tracking/detection systems, or overfit to trivial images. In this work, we present a novel and
arXiv:1911.09143v1
fatcat:kvwq7qloq5brhlsa4smi4cinii
more »
... simple solution to this problem based on ID-aware quality that measures the perceptual and semantic quality of images guided by their ID information. Specifically, we propose an ID-aware Embedding that consists of two key components: (1) Feature learning attention that aims to learn robust image embeddings by focusing on 'medium' hard images. This way it can prevent overfitting to trivial images, and alleviate the influence of outliers. (2) Feature fusion attention is to fuse image embeddings in the set to obtain the set-level embedding. It ignores noisy information and pays more attention to discriminative images to aggregate more discriminative information. Experimental results on four datasets show that our method outperforms state-of-the-art approaches despite the simplicity of our approach.
GAN-based Pose-aware Regulation for Video-based Person Re-identification
[article]
2019
arXiv
pre-print
Video-based person re-identification deals with the inherent difficulty of matching unregulated sequences with different length and with incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the inter-sequences pose/viewpoint misalignment is not
arXiv:1903.11552v1
fatcat:ypujostuwvaozaotf2t3hzoyze
more »
... idered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. Specifically, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their feature vectors into a more discriminative viewpoint-insensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.
Semantic Autoencoder for Zero-Shot Learning
2017
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g. attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g. attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen) classes without training data, a ZSL model typically suffers from the project domain shift
doi:10.1109/cvpr.2017.473
dblp:conf/cvpr/KodirovXG17
fatcat:sx5n4ldqingdndlejm43teebuu
more »
... . In this work, we present a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the encoder-decoder paradigm, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models. However, the decoder exerts an additional constraint, that is, the projection/code must be able to reconstruct the original visual feature. We show that with this additional reconstruction constraint, the learned projection function from the seen classes is able to generalise better to the new unseen classes. Importantly, the encoder and decoder are linear and symmetric which enable us to develop an extremely efficient learning algorithm. Extensive experiments on six benchmark datasets demonstrate that the proposed SAE outperforms significantly the existing ZSL models with the additional benefit of lower computational cost. Furthermore, when the SAE is applied to supervised clustering problem, it also beats the state-of-the-art.
Unsupervised Domain Adaptation for Zero-Shot Learning
2015
2015 IEEE International Conference on Computer Vision (ICCV)
Zero-shot learning (ZSL) can be considered as a special case of transfer learning where the source and target domains have different tasks/label spaces and the target domain is unlabelled, providing little guidance for the knowledge transfer. A ZSL method typically assumes that the two domains share a common semantic representation space, where a visual feature vector extracted from an image/video can be projected/embedded using a projection function. Existing approaches learn the projection
doi:10.1109/iccv.2015.282
dblp:conf/iccv/KodirovXFG15
fatcat:r2o6prdiabbileemh4dmtddcoq
more »
... ction from the source domain and apply it without adaptation to the target domain. They are thus based on naive knowledge transfer and the learned projections are prone to the domain shift problem. In this paper a novel ZSL method is proposed based on unsupervised domain adaptation. Specifically, we formulate a novel regularised sparse coding framework which uses the target domain class labels' projections in the semantic space to regularise the learned target domain projection thus effectively overcoming the projection domain shift problem. Extensive experiments on four object and action recognition benchmark datasets show that the proposed ZSL method significantly outperforms the state-of-the-arts.
Cross-View Discriminative Feature Learning for Person Re-Identification
2018
IEEE Transactions on Image Processing
Kodirov is with Anyvision, Belfast, UK. Email: elyor@anyvision.co N. M. Robertson is with EEECS/ECIT, Queen's University Belfast, UK. ...
doi:10.1109/tip.2018.2851098
pmid:29994678
fatcat:nxpidryun5erfg3cmglke7ob6a
Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification
2015
Procedings of the British Machine Vision Conference 2015
Many existing approaches to person re-identification (Re-ID) are based on supervised learning, which requires hundreds of matching pairs to be labelled for each pair of cameras. This severely limits their scalability for real-world applications. This work aims to overcome this limitation by developing a novel unsupervised Re-ID approach. The approach is based on a new dictionary learning for sparse coding formulation with a graph Laplacian regularisation term whose value is set iteratively. As
doi:10.5244/c.29.44
dblp:conf/bmvc/KodirovXG15
fatcat:igt3ogwj4rddnkoz5eiwv5gzxy
more »
... n unsupervised model, the dictionary learning model is well-suited to the unsupervised task, whilst the regularisation term enables the exploitation of cross-view identity-discriminative information ignored by existing unsupervised Re-ID methods. Importantly this model is also flexible in utilising any labelled data if available. Experiments on two benchmark datasets demonstrate that the proposed approach significantly outperforms the state-of-the-arts.
Ranked List Loss for Deep Metric Learning
2021
IEEE Transactions on Pattern Analysis and Machine Intelligence
Xinshao Yang Elyor ...
Kodirov received his Ph.D. degree in the School of Electronic Engineering and Computer Science, Queen Mary University of London, 2017 and the Master's degree in computer science from Chonnam National University ...
doi:10.1109/tpami.2021.3068449
pmid:33760730
fatcat:onbyudurbfhwjgk3lagd3azhqa
Zero-shot object recognition by semantic manifold distance
2015
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Object recognition by zero-shot learning (ZSL) aims to recognise objects without seeing any visual examples by learning knowledge transfer between seen and unseen object classes. This is typically achieved by exploring a semantic embedding space such as attribute space or semantic word vector space. In such a space, both seen and unseen class labels, as well as image features can be embedded (projected), and the similarity between them can thus be measured directly. Existing works differ in
doi:10.1109/cvpr.2015.7298879
dblp:conf/cvpr/FuXKG15
fatcat:2ophdrj72vfcffimbzrjbighgi
more »
... embedding space is used and how to project the visual data into the semantic embedding space. Yet, they all measure the similarity in the space using a conventional distance metric (e.g. cosine) that does not consider the rich intrinsic structure, i.e. semantic manifold, of the semantic categories in the embedding space. In this paper we propose to model the semantic manifold in an embedding space using a semantic class label graph. The semantic manifold structure is used to redefine the distance metric in the semantic embedding space for more effective ZSL. The proposed semantic manifold distance is computed using a novel absorbing Markov chain process (AMP), which has a very efficient closedform solution. The proposed new model improves upon and seamlessly unifies various existing ZSL algorithms. Extensive experiments on both the large scale ImageNet dataset and the widely used Animal with Attribute (AwA) dataset show that our model outperforms significantly the state-ofthe-arts.
Ranked List Loss for Deep Metric Learning
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Yang Elyor ...
Kodirov is currently a senior researcher at AnyVision, Belfast, UK. ...
doi:10.1109/cvpr.2019.00535
dblp:conf/cvpr/WangHKHGR19
fatcat:jrkbck4ljvfixii7dvlt2h3cwm
Zero-Shot Learning on Semantic Class Prototype Graph
2018
IEEE Transactions on Pattern Analysis and Machine Intelligence
However, we found that none of them, including SSE-ReLU [65] and Kodirov et al. ...
Compared to the inductive learningbased methods, our model beats the closest competitor Deep-SCoRe [43] by 8.2%. (2) As expected, the three transductive methods (Kodirov et al. ...
doi:10.1109/tpami.2017.2737007
pmid:28796607
fatcat:glg5peo2h5gytp6fyktg6j7bay
GAN-Based Pose-Aware Regulation for Video-Based Person Re-Identification
2019
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
Video-based person re-identification deals with the inherent difficulty of matching unregulated sequences with different length and with incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the intersequences pose/viewpoint misalignment is not
doi:10.1109/wacv.2019.00130
dblp:conf/wacv/BorgiaHKR19
fatcat:s6yzanspdvcojgh7x2cjtusmoq
more »
... dered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. Specifically, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their feature vectors into a more discriminative viewpointinsensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.
Deep Metric Learning by Online Soft Mining and Class-Aware Attention
[article]
2019
arXiv
pre-print
Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample
arXiv:1811.01459v3
fatcat:ntm2v4vjcfdfhmlhhjjkle7y7u
more »
... methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
Deep Metric Learning by Online Soft Mining and Class-Aware Attention
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample
doi:10.1609/aaai.v33i01.33015361
fatcat:wklvfundbjd25lb2pthegegzam
more »
... methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
« Previous
Showing results 1 — 15 out of 25 results