Multimodal Story Generation on Plural Images [article]

Jing Jiang
2021 arXiv   pre-print
Traditionally, text generation models take in a sequence of text as input, and iteratively generate the next most probable word using pre-trained parameters. In this work, we propose the architecture to use images instead of text as the input of the text generation model, called StoryGen. In the architecture, we design a Relational Text Data Generator algorithm that relates different features from multiple images. The output samples from the model demonstrate the ability to generate meaningful
more » ... aragraphs of text containing the extracted features from the input images. This is an undergraduate project report. Completed Dec. 2019 at the Cooper Union.
arXiv:2001.10980v2 fatcat:qu3ygu3fgbhz7gnl3wv23v34gm