Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Gayoung Jung, Jonghun Lee, Incheol Kim
2021 Sensors  
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire
more » ... video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.
doi:10.3390/s21093164 pmid:34063299 pmcid:PMC8124611 fatcat:vtl2wizi5jfjdbqgxngtelwhwy