TimeGate: Conditional Gating of Segments in Long-range Activities [article]

Noureldien Hussein, Mihir Jain, Babak Ehteshami Bejnordi
<span title="2020-04-03">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
When recognizing a long-range activity, exploring the entire video is exhaustive and computationally expensive, as it can span up to a few minutes. Thus, it is of great importance to sample only the salient parts of the video. We propose TimeGate, along with a novel conditional gating module, for sampling the most representative segments from the long-range activity. TimeGate has two novelties that address the shortcomings of previous sampling methods, as SCSampler. First, it enables a
variable sampling of segments. Thus, TimeGate can be fitted with modern CNNs and trained end-to-end as a single and unified model.Second, the sampling is conditioned on both the segments and their context. Consequently, TimeGate is better suited for long-range activities, where the importance of a segment heavily depends on the video context.TimeGate reduces the computation of existing CNNs on three benchmarks for long-range activities: Charades, Breakfast and MultiThumos. In particular, TimeGate reduces the computation of I3D by 50% while maintaining the classification accuracy.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.01808v1">arXiv:2004.01808v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bk3osbs4x5gjtb7alg4qbpppwm">fatcat:bk3osbs4x5gjtb7alg4qbpppwm</a> </span>
