A pilot investigation of information extraction in the semantic annotation of archaeological reports

Andreas Vlachidis, Douglas Tudhope
2012 International Journal of Metadata, Semantics and Ontologies  
Andreas Vlachidis is a Research Assistant in the Hypermedia Research unit at the University of Glamorgan and is currently writing up his PhD in the area of natural language processing, producing semantic annotations conforming to the CIDOC CRM ontology. He was responsible for producing the semantic annotations of grey literature in the AHRC funded STAR project (Semantic Tools for Archaeological Resources). He has previously worked as Division Leader for Computing Courses at North College in
more » ... saloniki and as ICT Developments Manager in the eCommerce domain. Douglas Tudhope is Professor in the Faculty of Advanced Technology, University of Glamorgan and leads the Hypermedia Research Unit. His main current research interests are the intersecting areas of information science, digital libraries and hypermedia and the semantic web. He was PI on the AHRC funded STAR and STELLAR projects and the EPSRC funded FACET project investigating thesaurus-based query expansion. Since 1977, he has been Editor of the journal, New Review of Hypermedia and Multimedia. He serves as a reviewer for various journals and international programme committees and is active in the Networked Knowledge Organisation Systems/Services (NKOS) network. Abstract. The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.
doi:10.1504/ijmso.2012.050183 fatcat:aznw5gifqvhp5fonsxdi6c4wpy