A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Structuring Domain-Specific Text Archives by Deriving a Probabilistic XML DTD
[chapter]
2002
Lecture Notes in Computer Science
Domain-specific documents often share an inherent, though undocumented structure. This structure should be made explicit to facilitate efficient, structure-based search in archives as well as information integration. Inferring a semantically structured XML DTD for an archive and subsequently transforming its texts into XML documents is a promising method to reach these objectives. Based on the KDD-driven DIAsDEM framework, we propose a new method to derive an archive-specific structured XML
doi:10.1007/3-540-45681-3_38
fatcat:n2mxpwd2wvfl7ghdqgfuybzlga