A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
MassiveSumm: a very large-scale, very multilingual, news summarisation dataset
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
Current research in automatic summarisation is unapologetically anglo-centered-a persistent state-of-affairs, which also predates neural net approaches. High-quality automatic summarisation datasets are notoriously expensive to create, posing a challenge for any language. However, with digitalisation, archiving, and social media advertising of newswire articles, recent work has shown how, with careful methodology application, large-scale datasets can now be simply gathered instead of written.
doi:10.18653/v1/2021.emnlp-main.797
fatcat:bogchhf3jfc43kgknomhx4pueq