Leveraging Context for Multi-Label Action Recognition and Detection in Video

Joao Antunes Martins
This thesis addresses video-based multi-person, multi-label, spatiotemporal action detection and recognition. This is a challenging problem because each person can be performing several actions at the same time (e.g. talking and walking), and simultaneously other actors can be performing different actions. We claim that these are problems where the use of contextual information (e.g. semantic descriptions of the scene) may lead to significant performance improvements. In this work, we develop
more » ... work, we develop several approaches to tackle this problem and validate them in challenging datasets. We propose a framework to integrateand test multiple sources of contextual information in video-based multi-person, multi-label, spatiotemporal action detection and recognition. We highlight six contributions,and that are collected in three publications (at different stages of publication at the time of this writing). The first contribution is a proposed Multisource Video Classification(MVC) framework that allows the combination of several sources of context information, for which we consider four types: actor centric input filtering (a way to focus attentionon the actor under analysis but still gather appearance information from the neighborhood), semantic neighbor context (a way to inform the model with the actions performed by nearby agents), object detection (how objects interacting with the actor can inform about its action) and pose data (how high level features extracted from the actor can help the classification process). The second contribution is a foveated approach to actor centric filtering for input selection that weights the appearance information in a decreasing way, from the center to the periphery of the actor bounding box. The third contribution is a novel encodingfor the semantic neighbor context and its custom classifier with spatial and temporal dependence. The fourth is a custom Hybrid Sigmoid-Softmax loss function for the multiclass/ multi-label case, that combines the cross-entropy loss typical of multi-class problems [...]
doi:10.1184/r1/13198112.v1 fatcat:epxy5xmpgzhk7prr6a6o6vwf6y