Filters








2 Hits in 2.8 sec

Multimodal Learning with Transformers: A Survey [article]

Peng Xu, Xiatian Zhu, David A. Clifton
2022 arXiv   pre-print
, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific  ...  The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer  ...  We reviewed the landscape by introducing the Transformer designs and training in the multimodal contexts. We summarized the key challenges and solutions for this emerging and exciting field.  ... 
arXiv:2206.06488v1 fatcat:6aoaczzbtvc43my2kmobo7glvy

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability  ...  With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  ERNIE-ViLG [717] formulates the text-to-image generation task as an autoregressive generative task and achieves new state-of-the-art result on MS-COCO.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4