A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Inspired by this, we propose a novel self-supervised learning method, named Text-enhanced Visual Deep InfoMax (TVDIM), to learn better visual representations by fully utilizing the naturally-existing multimodal ... Experimental results show that, TVDIM significantly outperforms previous visual self-supervised methods when processing the same set of images. ... The first stage is to do self-supervised training for image encoder via the proposed TVDIM. ...arXiv:2106.01797v2 fatcat:ut3dcxos7bhelb6vllwo5dicti