Filters








103 Hits in 3.4 sec

NLP Data Cleansing Based on Linguistic Ontology Constraints [chapter]

Dimitris Kontokostas, Martin Brümmer, Sebastian Hellmann, Jens Lehmann, Lazaros Ioannidis
2014 Lecture Notes in Computer Science  
NLP is -compared to other domains, such as biology -a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies.  ...  Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains.  ...  A Test Auto Generator reuses the RDFS and OWL modelling of a knowledge base to verify data quality. In particular, a TAG, based on a DQTP, takes a schema as input and returns test cases.  ... 
doi:10.1007/978-3-319-07443-6_16 fatcat:vavqfcbgczdvnaw3h4y4ez6c2i

An Integrated Set of Web Mining Tools for Research

D Aravind
2018 Zenodo  
Validation uses model-based data extraction to validate the data correctness.  ...  IE will use a hybrid extraction way to select portions from a web page and give data into a database. Generalization will clean data and use database techniques to analyze collected data.  ...  The difference between wrapper induction tools and those with NLP is that they format features to implicitly delineate the structure of data, while NLP relies on linguistic constraints.  ... 
doi:10.5281/zenodo.1410994 fatcat:fgeuwa7mlnfbnhmlewdkrmh3x4

An Empirical Assessment of Semantic Interpretation

Martin Romacker, Udo Hahn
2000 Applied Natural Language Processing Conference  
We introduce a framework for semantic interpretation in which dependency structures are mapped to conceptual representations based on a parsimonious set of interpretation schemata.  ...  Our focus is on the empirical evaluation of this approach to semantic interpretation, i.e., its quality in terms of recall and precision.  ...  Whether these are relevant or not for a particular application has to be determined by subsequent data/knowledge cleansing.  ... 
dblp:conf/anlp/RomackerH00 fatcat:xh2uzkjnendebbdf555ngjtsom

Linked Open Data Validity -- A Technical Report from ISWS 2018 [article]

Tayeb Abderrahmani Ghor, Esha Agrawal, Mehwish Alam, Omar Alqawasmeh, Claudia D'amato, Amina Annane, Amr Azzam, Andrew Berezovskyi, Russa Biswas, Mathias Bonduel, Quentin Brabant, Cristina-iulia Bucur, Elena Camossi, Valentina Anita Carriero, Shruthi Chari (+48 others)
2019 arXiv   pre-print
Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data  ...  Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP.  ...  data extracted with NLP, and reliable textual sources could be processed with NLP techniques to be used as a reference knowledge base to validate Linked Data sets.  ... 
arXiv:1903.12554v1 fatcat:e25yvjsucvghzol4uo3omflh7i

Inductive Entity Typing Alignment

Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy
2014 International Semantic Web Conference  
Our inductive data-driven approach recasts the alignment problem as a classification problem.  ...  We present experiments on two named entity recognition benchmark datasets, namely the CoNLL2003 newswire dataset and the MSM2013 microposts dataset.  ...  For instance, Cupid [7] combines a number of techniques such as linguistic matching, structure-based matching, constraint-based matching, and context-based matching at the schema element level and related  ... 
dblp:conf/semweb/RizzoET14 fatcat:e7s7s7gehrctpeo7i3xz56lw3e

Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale

Astrid Rheinländer, Mario Lehmann, Anja Kunkel, Jörg Meier, Ulf Leser
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere.  ...  In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc.  ...  ., general purpose (BASE), information extraction (IE), web analytics (WA), and data cleansing (DC).  ... 
doi:10.1145/2882903.2903736 dblp:conf/sigmod/RheinlanderLKML16 fatcat:i745iiaicjan3lnfyybachvahy

A Semi-Automated System for Recognizing Prior Knowledge

Joaquim Moré, Jordi Conesa, David Baneres, Montserrat Junyent
2015 International Journal of Emerging Technologies in Learning (iJET)  
In addition, the recognition of external courses is a process that all institutions, on-site and online learning organization, must perform during the access of new students, since it can be greatly useful  ...  These files contain the teaching plans previously scanned in plain-text format. (4) Data cleansing: possible misinterpretations are corrected by using dictionaries and edit distance algorithms that calculate  ...  ONTOLOGY MODEL This section describes the ontology model proposed for the semi-automated system. Note that the design has been performed from scratch based on the necessary knowledge to be stored.  ... 
doi:10.3991/ijet.v10i7.4610 fatcat:gwhkcw63k5citd4jvvgxf3u4zi

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

2019 KSII Transactions on Internet and Information Systems  
In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they  ...  Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.  ...  Sentence Analysis In data mining applications, data cleansing is essential to improve the quality of the final result.  ... 
doi:10.3837/tiis.2019.03.030 fatcat:xsskqcplevb5bjpjohdzdja5vy

Relational Learning Analysis of Social Politics using Knowledge Graph Embedding [article]

Bilal Abu-Salih, Marwan Al-Tawil, Ibrahim Aljarah, Hossam Faris, Pornpit Wongthongtham
2020 arXiv   pre-print
This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology.  ...  This paper presents a novel credibility domain-based KG Embedding framework.  ...  Natural Language Processing (NLP) and linguistic approaches as well as other statistical techniques [22] .  ... 
arXiv:2006.01626v1 fatcat:235ajzglfvdl7pdluysk6d2nua

A Spell Checking Web Service API for Smart City Communication Platforms

Vita S. Barletta, Danilo Caivano, Antonella Nannavecchia, Michele Scalera
2019 Open Journal of Applied Sciences  
on the semantic structure of the specific textual data.  ...  providing a Spell Checking Web Service API for Smart City communication platforms able to automatically select, among the large availability of open source spell checking tools, the most suitable tool based  ...  NLP data quality assessment has become an important need for NLP datasets (NLP Data Cleansing Based on Linguistic Ontology Constraints).  ... 
doi:10.4236/ojapps.2019.912066 fatcat:syqnosawpfh5fbq4qazepaoz2m

PhenoMiner: from text to a database of phenotypes associated with OMIM diseases

Nigel Collier, Tudor Groza, Damian Smedley, Peter N. Robinson, Anika Oellrich, Dietrich Rebholz-Schuhmann
2015 Database: The Journal of Biological Databases and Curation  
The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can  ...  Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotypedisorder  ...  with it based on NCBO Annotator data (18) .  ... 
doi:10.1093/database/bav104 pmid:26507285 pmcid:PMC4622021 fatcat:27zs4ue3tzbtvi67db2ukl54yq

Using Data Crawlers and Semantic Web to Build Financial XBRL Data Generators: The SONAR Extension Approach

Miguel Ángel Rodríguez-García, Alejandro Rodríguez-González, Ricardo Colomo-Palacios, Rafael Valencia-García, Juan Miguel Gómez-Berbís, Francisco García-Sánchez
2014 The Scientific World Journal  
Since the Web has become the most significant data source, intelligent crawlers based on Semantic Technologies have become trailblazers in the search of knowledge combining natural language processing  ...  and ontology engineering techniques.  ...  To resolve this issue, a constraint is imposed on the design of the ontology.  ... 
doi:10.1155/2014/506740 pmid:24587726 pmcid:PMC3920815 fatcat:x7nbgrpb3zb25f5esbt42csmee

Pairing Conceptual Modeling with Machine Learning [article]

Wolfgang Maass, Veda C. Storey
2021 arXiv   pre-print
With the increasing emphasis on digitizing and processing large amounts of data for business and other applications, it would be helpful to consider how these areas of research can complement each other  ...  We then examine how conceptual modeling can be applied to machine learning and propose a framework for incorporating conceptual modeling into data science projects.  ...  Acknowledgements This paper was based on a keynote presentation given by the first author at the International Conference on Conceptual Modeling.  ... 
arXiv:2106.14251v1 fatcat:n4kujuzttja67jqjs3vz3bdiba

Towards the 5th Industrial Revolution: A literature review and a framework for Process Optimization Based on Big Data Analytics and Semantics

Dimitris Mourtzis
2021 Journal of Machine Engineering  
based on the integration of semantics.  ...  Consequently, albeit engineers currently can monitor the factory level, optimization is cut off of the data acquisition, and is based on data related methodologies.  ...  In this scenario, NLP techniques integrate statistical and linguistic techniques with graph-based AI.  ... 
doi:10.36897/jme/141834 fatcat:tsp4zeziebg5hcxwguwx6dgsgi

Identifying Food-related Word Association and Topic Model Processing using LDA

Yu-Chin Li, Tsung-Chih Hu, Kuo-En Chang
2018 Journal of Library and Information Studies  
The empirical results were analyzed on two levels: (1) by the expert word association classification: taxonomic and script proposed by Ross and Murphy (1999); (2) followed by the associative hierarchy  ...  To test the similarity of the output of the topic model and human word association, the "Time-limited Multiple Divergent Thinking Test of Word Associative Strategy" (TLM-DTTWAS) was used to collect data  ...  Data cleansing: Filtration was carried out before processing to remove invalid data.  ... 
doi:10.6182/jlis.201806_16(1).023 doaj:210ef6d9a5304323b1ed458e4cc0e2fa fatcat:ddlnji2yfjaulcywf2gljwrmie
« Previous Showing results 1 — 15 out of 103 results