Text Mining in Archaeology: Extracting Information from Archaeological Reports [chapter]

Julian Richards, Douglas Tudhope, Andreas Vlachidis
2015 Mathematics and Archaeology  
Introduction Archaeologists generate large quantities of text, ranging from unpublished technical fieldwork reports (the 'grey literature') to synthetic journal articles. However, the indexing and analysis of these documents can be time consuming and lacks consistency when done by hand. It is also rarely integrated with the wider archaeological information domain, with bibliographic searches having to be undertaken independently of database queries, for example. Text mining offers a means of
more » ... racting information from large volumes of text, providing researchers with an easy way of locating relevant texts and also of identifying patterns in the literature. In recent years techniques of Natural Language Processing (NLP) and its subfield, Information Extraction (IE), have been adopted to allow researchers to find, compare and analyse relevant documents, and to link them to other types of data. This chapter introduces the underpinning mathematics and provides a short presentation of the algorithms and distance measures used, from the point of view of artificial intelligence and computational logic. It describes the different NLP schools of thought and compares the pros and cons of rule-based vs machine learning approaches to information extraction. The role of ontologies and named entity recognition will be discussed and the chapter demonstrates how IE can provide the basis for semantic annotation and how it contributes to the construction of a semantic web for archaeology. The authors have worked on a number of projects that have employed techniques from NLP and IE in Archaeology, including Archaeotools, STAR and STELLAR. The chapter describes the archaeological user needs requirement, drawing examples from several countries, and the authors present examples drawn from their own projects, and previous work by others, of how NLP and IE can contribute to addressing this need. The problems and challenges of employing text mining in the archaeological domain are discussed, as well as the potential benefits.
doi:10.1201/b18530-15 fatcat:ndf5usto4zhg3cswe2ypfqnnw4