9,658 Hits in 5.6 sec

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks [article]

Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
2017 arXiv   pre-print
In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.  ...  In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings.  ...  To enhance inductive transfer, we propose sharing core vision and language representations across all tasks in a way that exploits the word-region alignment.  ... 
arXiv:1704.00260v2 fatcat:njqnfq7imvfpxhmsl5qq2lctsi

Unsupervised Cross-Lingual Representation Learning

Sebastian Ruder, Anders Søgaard, Ivan Vulić
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts  
Cross-lingual word representations offer an elegant and language-pair independent way to represent content across different languages.  ...  Part V: Stochastic Dictionary Induction improves Iterative Alignment We will then discuss stochastic approaches to improve the iterative refinement of the dictionary.  ...  content across different languages.  ... 
doi:10.18653/v1/p19-4007 dblp:conf/acl/RuderSV19 fatcat:khz7rqq3kzaojjssfdvkiqv3ma

Visual Bilingual Lexicon Induction with Transferred ConvNet Features

Douwe Kiela, Ivan Vulić, Stephen Clark
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
The CNN image-based approach is also compared with state-of-the-art linguistic approaches to bilingual lexicon induction, even outperforming these for one of three language pairs on another standard dataset  ...  This paper is concerned with the task of bilingual lexicon induction using imagebased features.  ...  Introduction Bilingual lexicon induction is the task of finding words that share a common meaning across different languages.  ... 
doi:10.18653/v1/d15-1015 dblp:conf/emnlp/KielaVC15 fatcat:rzensydrxbdltcz4q5vhp4tmpi

Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness [article]

Paul Pu Liang
2022 arXiv   pre-print
, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other.  ...  to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation).  ...  tasks improve over separate models trained on each task alone. (2) In multimodal alignment, we investigate whether the joint model can retrieve semantically similar data across modalities.(3) In multimodal  ... 
arXiv:2205.00001v2 fatcat:hnqtq5cer5bxhetyuctifjt5za

VLGrammar: Grounded Grammar Induction of Vision and Language [article]

Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang
2021 arXiv   pre-print
In this work, we study grounded grammar induction of vision and language in a joint learning framework.  ...  While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure.  ...  Grounded Vision and Language Learning In recent years, there have been lots of efforts and advances on exploiting the cross-modality alignment between vision and language for various tasks, such as image-text  ... 
arXiv:2103.12975v1 fatcat:mx6q5dm3hrbi7lrtsm3ned7pja

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision [article]

Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao
2022 arXiv   pre-print
With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.  ...  and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score).  ...  We also find it possible to transfer across different languages and modalities using SimVLM.  ... 
arXiv:2108.10904v3 fatcat:glozbeeytvdyvcgl7ersyz4i34

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models [article]

Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, PengChuan Zhang, Lei Zhang
2022 arXiv   pre-print
After that, we show how recent work utilizes large-scale raw image-text data to learn language-aligned visual representations that generalize better on zero or few shot learning tasks.  ...  We summarize the development in this field into three time periods, namely task-specific methods, vision-language pre-training (VLP) methods, and larger models empowered by large-scale weakly-labeled data  ...  Language-aligned emphasizes that the vision feature aligned with language can help in vision tasks.  ... 
arXiv:2203.01922v1 fatcat:vnjfetgkpzedpfhklufooqet7y

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation [article]

Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang
2021 arXiv   pre-print
., attention mechanism and sequential image representation) which play an important role in knowledge transfer.  ...  With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature.  ...  to different vision tasks.  ... 
arXiv:2108.05988v2 fatcat:gzcejptsz5bb5pwpo3zsav3ekq

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  for low-resource languages.  ...  Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909).  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama

MLP Architectures for Vision-and-Language Modeling: An Empirical Study [article]

Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang
2021 arXiv   pre-print
We initiate the first empirical study on the use of MLP architectures for vision-and-language (VL) fusion.  ...  These results hint that MLPs can effectively learn to align vision and text features extracted from lower-level encoders without heavy reliance on self-attention.  ...  transferred to VL tasks like VQA.  ... 
arXiv:2112.04453v1 fatcat:pnr5aeiwlffzncin4vdi5wojsi

Learning Translations via Images with a Massively Multilingual Image Dataset

John Hewitt, Daphne Ippolito, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya, Chris Callison-Burch
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
In contrast, we have collected by far the largest available dataset for this task, with images for approximately 10,000 words in each of 100 languages.  ...  To improve image-based translation, we introduce a novel method of predicting word concreteness from images, which improves on a previous stateof-the-art unsupervised technique.  ...  Other NLP+Vision tasks that have been enabled by the availability of large datasets include caption generation for images, action recognition in videos, visual question answering, and others.  ... 
doi:10.18653/v1/p18-1239 dblp:conf/acl/Callison-BurchW18 fatcat:myfms7sguvhozatv3grrrotiae

Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop [article]

Afra Alishahi and Grzegorz Chrupała and Tal Linzen
2019 arXiv   pre-print
language.  ...  The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of  ...  Dhar and Bisazza (2018) test whether syntactic representations in a language model can transfer across languages.  ... 
arXiv:1904.04063v1 fatcat:iwuio4l62jcpjindqpyzs6rmdy

A Survey of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 The Journal of Artificial Intelligence Research  
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  for low-resource languages.  ...  Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909). Sebastian is now affiliated with DeepMind.  ... 
doi:10.1613/jair.1.11640 fatcat:vwlgtzzmhfdlnlyaokx2whxgva

Multimodal Grounding for Language Processing [article]

Lisa Beinborn, Teresa Botschen, Iryna Gurevych
2019 arXiv   pre-print
Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise.  ...  We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations.  ...  We thank Faraz Saeedan for his assistance with the computation of the visual embeddings for the imSitu images. We thank the anonymous reviewers for their insightful comments.  ... 
arXiv:1806.06371v2 fatcat:ucqjg2uhabf3vfkgjdfoa5z5yy

Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages [article]

Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, Nanyun Peng
2019 arXiv   pre-print
One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings.  ...  Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer.  ...  • Does language-agnostic representations improve cross-language transfer?  ... 
arXiv:1909.09265v1 fatcat:ewdnnmnmenbtxc3b4xxf24ewtq
« Previous Showing results 1 — 15 out of 9,658 results