BNOSA: A Bayesian Network and Ontology Based Semantic Annotation Framework

Quratulain Rajput, Sajjad Haider
2011 Social Science Research Network  
The paper presents a semantic annotation framework that is capable of extracting relevant information from unstructured, ungrammatical and incoherent data sources. The framework, named BNOSA, uses ontology to conceptualize a problem domain and to extract data from the given corpora, and Bayesian networks to resolve conflicts and to predict missing data. The framework is extensible as it is capable of dynamically extracting data from any problem domain given a pre-defined ontology and a
more » ... ding Bayesian network. Experiments have been conducted to analyze the performance of BNOSA on several problem domains. The sets of corpora used in the experiments belong to selling-purchasing websites where product information is entered by ordinary web users in a structure-free format. The results show that BNOSA performs reasonably well to find location of the data of interest using context keywords provided as part of the domain ontology. In case of more than one value being extracted for an attribute or if the value is missing, Bayesian networks identify the most appropriate value for that attribute. P(SP=clean) = 0.96 P(FU=yes) = 0.98 P(ST=yes| FU=yes, SP=clean) = 0.99 P(ST=yes| FU=yes, SP=dirty) = 0.01 P(ST=yes| FU=no, SP=clean) = 0 P(ST=yes| FU=no, SP=dirty) =0 P(FG=empty | FU=yes) = 0.01 P(FG=empty | FU=no) =0.99 Abstract The paper presents a semantic annotation framework that is capable of extracting relevant information from unstructured, ungrammatical and incoherent data sources. The framework, named BNOSA, uses ontology to conceptualize a problem domain and to extract data from the given corpora, and Bayesian networks to resolve conflicts and to predict missing data. The framework is extensible as it is capable of dynamically extracting data from any problem domain given a pre-defined ontology and a corresponding Bayesian network. Experiments have been conducted to analyze the performance of BNOSA on several problem domains. The sets of corpora used in the experiments belong to selling-purchasing websites where product information is entered by ordinary web users in a structure-free format. The results show that BNOSA performs reasonably well to find location of the data of interest using context keywords provided as part of the domain ontology. In case of more than one value being extracted for an attribute or if the value is missing, Bayesian networks identify the most appropriate value for that attribute.
doi:10.2139/ssrn.3199510 fatcat:hlnzlupah5hr3b5npsnontbp3m