A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
[article]
2022
arXiv
pre-print
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods
arXiv:2112.07566v2
fatcat:ei7bo6bg3rgahp44oznsftnvje