A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
[article]
2021
arXiv
pre-print
Visual events are a composition of temporal actions involving actors spatially interacting with objects. When developing computer vision models that can reason about compositional spatio-temporal events, we need benchmarks that can analyze progress and uncover shortcomings. Existing video question answering benchmarks are useful, but they often conflate multiple sources of error into one accuracy metric and have strong biases that models can exploit, making it difficult to pinpoint model
arXiv:2103.16002v1
fatcat:vkcqfxgssvb5bjwp7zvqetbpti