A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
The amount of news published and read online has increased tremendously in recent years, making news data an interesting resource for many research disciplines, such as the social sciences and linguistics. However, large scale collection of news data is cumbersome due to a lack of generic tools for crawling and extracting such data. We present news-please, a generic, multilanguage, open-source crawler and extractor for news that works out-of-the-box for a large variety of news websites. Ourdoi:10.18452/1447 dblp:conf/isiwi/HamborgMBG17 fatcat:763h7ckq6rf2hlyqp6t46s4pku