A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
This paper presents and describes TeMex, a site-level web template extractor. TeMex is fully automatic, and it can work with online webpages without any preprocessing stage (no information about the template or the associated webpages is needed) and, more importantly, it does not need a predefined set of webpages to perform the analysis. TeMex only needs a URL. Contrarily to previous approaches, it includes a mechanism to identify webpage candidates that share the same template. This mechanismdoi:10.1145/2740908.2742835 dblp:conf/www/AlarteIST15 fatcat:3eqfrewcgbghbmd2o2bjqj4erm