A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Scene retrieval from input descriptions has been one of the most important applications with the increasing number of videos on the Web. However, this is still a challenging task since semantic gaps between features of texts and videos exist. In this paper, we try to solve this problem by utilizing a textto-image Generative Adversarial Network (GAN), which has become one of the most attractive research topics in recent years. The text-to-image GAN is a deep learning model that can generatedoi:10.1109/access.2019.2947409 fatcat:dktmrwmvxbbhpebpdhig7r444u