A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
How to jointly represent images and texts is important for tasks involving both modalities. Visual-semantic embedding models have been recently proposed and shown to be effective. The key idea is that by learning a mapping from images into a semantic text space, the algorithm is able to learn a compact and effective joint representation. However, existing approaches simply map each text concept to a single point in the semantic space. Mapping instead to a density distribution provides manydoi:10.1145/2964284.2967212 dblp:conf/mm/RenJLFY16 fatcat:i3pc7yh64raerdyndksx4vvaee