Metadata Statistics for a Large Web Corpus

Peter Mika, Tim Potter
2012 The Web Conference  
We provide an analysis of the adoption of metadata standards on the Web based a large crawl of the Web. In particular, we look at what forms of syntax and vocabularies publishers are using to mark up data inside HTML pages. We also describe the process that we have followed and the difficulties involved in web data extraction.
dblp:conf/www/MikaP12 fatcat:ank3izdywretrgtyl3bspqf62i