Text as Neural Operator:Image Manipulation by Text Instruction

Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa
2021 Proceedings of the 29th ACM International Conference on Multimedia  
make middle-left small gray object large remove bottom-center large yellow sphere (add) small pine tree placed left side with left side cut off a bit Input Result Input Result Input Result Figure 1: Image manipulation by text instruction. The input is multimodal consisting of a reference image and a text instruction. The results are synthesized images by our model.
doi:10.1145/3474085.3475343 fatcat:jf6uwy2b7rhlxenwy4yjtfhhaq