A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
Recent work on contrastive losses for learning joint embeddings over multimodal data has been successful at downstream tasks such as retrieval and classification. On the other hand, work on joint representation learning for 3D shapes and text has thus far mostly focused on improving embeddings through modeling of complex attention between representations , or multi-task learning . We show that with large batch contrastive learning we achieve SoTA on text-shape retrieval without complexarXiv:2201.07366v1 fatcat:ff2frrehjffivbr27tgndoddmu