A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
A Reproducible IT-Blog Corpus
2021
Journal of Open Humanities Data
The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user's local machine. The targets have been hand-picked with the intention to represent the discourse on blogs and websites dedicated to questions at the intersection of technology and society from Germany and the United States of America. The texts have been retrieved by web crawling techniques. The resulting
doi:10.5334/johd.35
fatcat:juant26ygbepfew7oht47imbku