FunGramKB term extractor: A tool for building terminological ontologies from specialised corpora [chapter]

Ángel Felices-Lago, Pedro Ureña Gómez-Moreno
2014 Studies in Language Companion Series  
In this article we collect a corpus of texts which operate with a controlled language (ASD Simplified Technical English) in order to facilitate the development of a new domain-specific ontology (the aircraft structure) based on a technical discipline (aeronautical engineering) included in the so called "hard" sciences. This new repository should be compatible with the Core Ontology and the corresponding English Lexicon in FunGramKB (a multipurpose lexico-conceptual knowledge base for natural
more » ... guage processing (NLP)), and, in the same vein, should eventually give support to aircraft maintenance management systems. By contrast, in previous approaches we applied a stepwise methodology for the construction of a domain-specific subontology compatible with FunGramKB systems in criminal law, but the high occurrence of terminological banalisation and the scarce number of specific terms, due to the social nature of the discipline, were added problems to the most common NLP difficulties (polysemy and ambiguity). Taking into consideration previous results and the complexity of this task, here we only intend to take the first step towards the modelling of the aircraft ontology: the development of its taxonomic hierarchy. Consequently, the hierarchy starts with the whole system (i.e., an aircraft) and follows the traditional decomposition of the system down to the elementary components (top-down approach). At the same time, 87 we have collected a corpus of 2,480 files of aircraft maintenance instructions, courtesy of Airbus in Seville. For the bottom-up approach (under construction), we consult specialised references end explore the corpus through the identification and extraction of term candidates with DEXTER, an online multilingual workbench especially designed for the discovery and extraction of terms.
doi:10.1075/slcs.150.10fel fatcat:ckfmlrgkvrb2xiq3v2sgnuievy