A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
[article]
2017
arXiv
pre-print
In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. ...
In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. ...
To enhance inductive transfer, we propose sharing core vision and language representations across all tasks in a way that exploits the word-region alignment. ...
arXiv:1704.00260v2
fatcat:njqnfq7imvfpxhmsl5qq2lctsi
Unsupervised Cross-Lingual Representation Learning
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Cross-lingual word representations offer an elegant and language-pair independent way to represent content across different languages. ...
Part V: Stochastic Dictionary Induction improves Iterative Alignment We will then discuss stochastic approaches to improve the iterative refinement of the dictionary. ...
content across different languages. ...
doi:10.18653/v1/p19-4007
dblp:conf/acl/RuderSV19
fatcat:khz7rqq3kzaojjssfdvkiqv3ma
Visual Bilingual Lexicon Induction with Transferred ConvNet Features
2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
The CNN image-based approach is also compared with state-of-the-art linguistic approaches to bilingual lexicon induction, even outperforming these for one of three language pairs on another standard dataset ...
This paper is concerned with the task of bilingual lexicon induction using imagebased features. ...
Introduction Bilingual lexicon induction is the task of finding words that share a common meaning across different languages. ...
doi:10.18653/v1/d15-1015
dblp:conf/emnlp/KielaVC15
fatcat:rzensydrxbdltcz4q5vhp4tmpi
Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness
[article]
2022
arXiv
pre-print
, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other. ...
to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation). ...
tasks improve over separate models trained on each task alone. (2) In multimodal alignment, we investigate whether the joint model can retrieve semantically similar data across modalities.(3) In multimodal ...
arXiv:2205.00001v2
fatcat:hnqtq5cer5bxhetyuctifjt5za
VLGrammar: Grounded Grammar Induction of Vision and Language
[article]
2021
arXiv
pre-print
In this work, we study grounded grammar induction of vision and language in a joint learning framework. ...
While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure. ...
Grounded Vision and Language Learning In recent years, there have been lots of efforts and advances on exploiting the cross-modality alignment between vision and language for various tasks, such as image-text ...
arXiv:2103.12975v1
fatcat:mx6q5dm3hrbi7lrtsm3ned7pja
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
[article]
2022
arXiv
pre-print
With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. ...
and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score). ...
We also find it possible to transfer across different languages and modalities using SimVLM. ...
arXiv:2108.10904v3
fatcat:glozbeeytvdyvcgl7ersyz4i34
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
[article]
2022
arXiv
pre-print
After that, we show how recent work utilizes large-scale raw image-text data to learn language-aligned visual representations that generalize better on zero or few shot learning tasks. ...
We summarize the development in this field into three time periods, namely task-specific methods, vision-language pre-training (VLP) methods, and larger models empowered by large-scale weakly-labeled data ...
Language-aligned emphasizes that the vision feature aligned with language can help in vision tasks. ...
arXiv:2203.01922v1
fatcat:vnjfetgkpzedpfhklufooqet7y
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
[article]
2021
arXiv
pre-print
., attention mechanism and sequential image representation) which play an important role in knowledge transfer. ...
With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. ...
to different vision tasks. ...
arXiv:2108.05988v2
fatcat:gzcejptsz5bb5pwpo3zsav3ekq
A Survey Of Cross-lingual Word Embedding Models
[article]
2019
arXiv
pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models ...
for low-resource languages. ...
Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909). ...
arXiv:1706.04902v3
fatcat:lts6uop77zaazhzlbygqmdsama
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
[article]
2021
arXiv
pre-print
We initiate the first empirical study on the use of MLP architectures for vision-and-language (VL) fusion. ...
These results hint that MLPs can effectively learn to align vision and text features extracted from lower-level encoders without heavy reliance on self-attention. ...
transferred to VL tasks like VQA. ...
arXiv:2112.04453v1
fatcat:pnr5aeiwlffzncin4vdi5wojsi
Learning Translations via Images with a Massively Multilingual Image Dataset
2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In contrast, we have collected by far the largest available dataset for this task, with images for approximately 10,000 words in each of 100 languages. ...
To improve image-based translation, we introduce a novel method of predicting word concreteness from images, which improves on a previous stateof-the-art unsupervised technique. ...
Other NLP+Vision tasks that have been enabled by the availability of large datasets include caption generation for images, action recognition in videos, visual question answering, and others. ...
doi:10.18653/v1/p18-1239
dblp:conf/acl/Callison-BurchW18
fatcat:myfms7sguvhozatv3grrrotiae
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
[article]
2019
arXiv
pre-print
language. ...
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of ...
Dhar and Bisazza (2018) test whether syntactic representations in a language model can transfer across languages. ...
arXiv:1904.04063v1
fatcat:iwuio4l62jcpjindqpyzs6rmdy
A Survey of Cross-lingual Word Embedding Models
2019
The Journal of Artificial Intelligence Research
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models ...
for low-resource languages. ...
Ivan's work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909). Sebastian is now affiliated with DeepMind. ...
doi:10.1613/jair.1.11640
fatcat:vwlgtzzmhfdlnlyaokx2whxgva
Multimodal Grounding for Language Processing
[article]
2019
arXiv
pre-print
Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. ...
We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. ...
We thank Faraz Saeedan for his assistance with the computation of the visual embeddings for the imSitu images. We thank the anonymous reviewers for their insightful comments. ...
arXiv:1806.06371v2
fatcat:ucqjg2uhabf3vfkgjdfoa5z5yy
Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages
[article]
2019
arXiv
pre-print
One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. ...
Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. ...
• Does language-agnostic representations improve cross-language transfer? ...
arXiv:1909.09265v1
fatcat:ewdnnmnmenbtxc3b4xxf24ewtq
« Previous
Showing results 1 — 15 out of 9,658 results