Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding [article]

Yidan Sun, Qin Chao, Boyang Li
2022 arXiv   pre-print
Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives(SyMoN), containing 5,193 video summaries of popular movies and TV series. SyMoN captures naturalistic storytelling videos for human audience made by human creators, and has higher story coverage and more frequent mental-state references than similar video-language story datasets. Differing
more » ... most existing video-text datasets, SyMoN features large semantic gaps between the visual and the textual modalities due to the prevalence of reporting bias and mental state descriptions. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.
arXiv:2203.05711v1 fatcat:y4lqpr6hjzhv5fo5vbfaootnrq