Filters








7,389 Hits in 4.6 sec

Zero-Shot Learning Through Cross-Modal Transfer [article]

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng
2013 arXiv   pre-print
Most previous zero-shot learning models can only differentiate between unseen classes.  ...  In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like.  ...  However, our work is able to classify object categories without any training data due to the cross-modal knowledge transfer from natural language and at the same time obtain high performance on classes  ... 
arXiv:1301.3666v2 fatcat:cluvoeqj6zb2tfoimh7sshwdva

xGQA: Cross-Lingual Visual Question Answering [article]

Jonas Pfeiffer and Gregor Geigle and Aishwarya Kamath and Jan-Martin O. Steitz and Stefan Roth and Ivan Vulić and Iryna Gurevych
2022 arXiv   pre-print
of around 38 accuracy points in target languages showcases the difficulty of zero-shot cross-lingual transfer for this task.  ...  Our proposed methods outperform current state-of-the-art multilingual multimodal models (e.g., M3P) in zero-shot cross-lingual settings, but the accuracy remains low across the board; a performance drop  ...  Few-Shot Cross-Lingual Transfer For few-shot cross-lingual scenarios we follow Lauscher et al. (2020) and start from the same finetuned model as for zero-shot transfer (see §5.3).  ... 
arXiv:2109.06082v2 fatcat:3wofbl56ffbujfutrxlkpjubfq

Zero-Shot Activity Recognition with Videos [article]

Evin Pinar Ornek
2020 arXiv   pre-print
The zero-shot recognition results are evaluated by top-n accuracy. Then, the manifold learning ability is measured by mean Nearest Neighbor Overlap.  ...  In this paper, we examined the zero-shot activity recognition task with the usage of videos.  ...  Background Zero-shot activity recognition The zero-shot object classification has been applied through cross modal information transfer between the images and the language.  ... 
arXiv:2002.02265v1 fatcat:umbgctxyzzbvhcfrkg7dgfnciq

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders [article]

Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata
2019 arXiv   pre-print
Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.  ...  We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning.  ...  Related Work In this section, we present related work on generalized zero-shot learning, few-shot learning and cross-modal reconstruction. Generalized Zero-and Few-Shot Learning.  ... 
arXiv:1812.01784v4 fatcat:px7zvsnsz5a5zfxt7vbvhxnh54

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment [article]

Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
2022 arXiv   pre-print
We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.  ...  CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks. Previously, CLIP is only regarded as a powerful visual encoder.  ...  We explore a zero-shot cross-modality (language and vision) transfer capability through the visual entailment task.  ... 
arXiv:2203.07190v1 fatcat:whf2ljh2mjfa5l4wsbr5dpvktq

Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning

Sanghyun Seo, Juntae Kim
2019 Applied Sciences  
These methodologies improve the performance of zero-shot learning by adjusting distances from a semantic vector to visual vector when performing zero-shot cross-modal retrieval.  ...  One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval  ...  Acknowledgments: This research was supported by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science,  ... 
doi:10.3390/app9153133 fatcat:qnlhu3d34reczb37zpue7ff34a

Obtaining referential word meanings from visual and distributional information: Experiments on object naming

Sina Zarrieß, David Schlangen
2017 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
We show that this is particularly beneficial for zero-shot learning, as compared to projecting visual objects directly into the distributional space.  ...  Therefore, we investigate models of referential word meaning that link visual to lexical information which we assume to be given through distributional word embeddings.  ...  In a zero-shot setup of an object naming task, we find that combining lexical and visual information during training is most beneficial, outperforming variants of cross-modal transfer.  ... 
doi:10.18653/v1/p17-1023 dblp:conf/acl/ZarriessS17 fatcat:kjji6eznnrg7jp52gx5edofsla

CyCLIP: Cyclic Contrastive Language-Image Pretraining [article]

Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan A. Rossi, Vishwa Vinay, Aditya Grover
2022 arXiv   pre-print
Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional  ...  In particular, we show that consistent representations can be learned by explicitly symmetrizing (a) the similarity between the two mismatched image-text pairs (cross-modal consistency); and (b) the similarity  ...  Zero-Shot Transfer We compare the zero-shot performance of CLIP and CYCLIP on standard image classification datasets: CIFAR-10, CIFAR-100 [29] , and ImageNet1K [46] .  ... 
arXiv:2205.14459v1 fatcat:jkpwi5xfunff3bqggqht256q4i

Zero Shot on the Cold-Start Problem: Model-Agnostic Interest Learning for Recommender Systems [article]

Philip J. Feng, Pingjun Pan, Tingting Zhou, Hongxiang Chen, Chuanjiang Luo
2021 arXiv   pre-print
Specifically, the zero-shot tower first performs cross-modal reconstruction with dual auto-encoders to obtain virtual behavior data from highly aligned hidden features for new users; and the ranking tower  ...  can then output recommendations for users based on the completed data by the zero-shot tower.  ...  Through learning from the dense embedding vectors, the zero-shot tower obtains more effective training and extraction processes.  ... 
arXiv:2108.13592v1 fatcat:m7np66xmdngzjl5i2tabhrapde

Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval [article]

Xinxun Xu, Muli Yang, Yanhua Yang, Hao Wang
2022 arXiv   pre-print
Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario.  ...  However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic from different  ...  Zero-Shot Learning Existing zero-shot approaches can be classified into two categories: embedding-based and generative-based approaches.  ... 
arXiv:2003.09869v2 fatcat:nvl7igkeyncebihv44g7r376gy

Delving Deeper into Cross-lingual Visual Question Answering [article]

Chen Liu, Jonas Pfeiffer, Anna Korhonen, Ivan Vulic, Iryna Gurevych
2022 arXiv   pre-print
existing transfer methods. 2) We study and dissect cross-lingual VQA across different question types of varying complexity, across different multilingual multi-modal Transformers, and in zero-shot and  ...  Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers and large gaps to monolingual performance, attributed mostly to misalignment  ...  Table 8 : 8 Zero-shot cross-lingual transfer results with and without LayerNorm.  ... 
arXiv:2202.07630v1 fatcat:qeitqk3jmjb2lai3mz444l2kpu

Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval

Jialin Tian, Kai Wang, Xing Xu, Zuo Cao, Fumin Shen, Heng Tao Shen
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Zero-Shot Cross-Modal Retrieval (ZS-CMR) has recently drawn increasing attention as it focuses on a practical retrieval scenario, i.e., the multimodal test set consists of unseen classes that are disjoint  ...  transfer.  ...  features still need to be improved for the zero-shot cross-modal retrieval task.  ... 
doi:10.1145/3477495.3532028 fatcat:axwwf2jxufbw5n75n4kooacery

Cross-View Policy Learning for Street Navigation [article]

Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar
2019 arXiv   pre-print
We further reformulate the transfer learning paradigm into three stages: 1) cross-modal training, when the agent is initially trained on multiple city regions, 2) aerial view-only adaptation to a new area  ...  Experimental results suggest that the proposed cross-view policy learning enables better generalization of the agent and allows for more effective transfer to unseen environments.  ...  The zero-shot reward is averaged over 350M steps. The proposed cross-view method achieves a zero-shot reward of 29, significantly higher than the reward of 5 obtained by the single-view method.  ... 
arXiv:1906.05930v2 fatcat:mwf5brjotzedrk2ll4ibypv5cq

Cross-View Policy Learning for Street Navigation

Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
We further reformulate the transfer learning paradigm into three stages: 1) cross-modal training, when the agent is initially trained on multiple city regions, 2) aerial view-only adaptation to a new area  ...  Experimental results suggest that the proposed cross-view policy learning enables better generalization of the agent and allows for more effective transfer to unseen environments.  ...  The zero-shot reward is averaged over 350M steps. The proposed cross-view method achieves a zero-shot reward of 29, significantly higher than the reward of 5 obtained by the single-view method.  ... 
doi:10.1109/iccv.2019.00819 dblp:conf/iccv/LiHMF19 fatcat:ksrgdrjhpvcqpjgq7qtijofe4a

Attribute-Guided Network for Cross-Modal Zero-Shot Hashing [article]

Zhong Ji, Yuxin Sun, Yunlong Yu, Yanwei Pang, Jungong Han
2018 arXiv   pre-print
Zero-Shot Hashing aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories.  ...  Extensive experimental results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks.  ...  Cross-Modal Zero-Shot Hashing Under cross-modal zero-shot retrieval setting, i.e., TBIR, the seen data are used for training the model.  ... 
arXiv:1802.01943v1 fatcat:jlodoui5cnarbbitg3l2hdn2la
« Previous Showing results 1 — 15 out of 7,389 results