A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multimodal Learning with Transformers: A Survey
[article]
2022
arXiv
pre-print
, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific ...
The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer ...
We reviewed the landscape by introducing the Transformer designs and training in the multimodal contexts. We summarized the key challenges and solutions for this emerging and exciting field. ...
arXiv:2206.06488v1
fatcat:6aoaczzbtvc43my2kmobo7glvy
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability ...
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. ...
ERNIE-ViLG [717] formulates the text-to-image generation task as an autoregressive generative task and achieves new state-of-the-art result on MS-COCO. ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4