A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
NLP Data Cleansing Based on Linguistic Ontology Constraints
[chapter]
2014
Lecture Notes in Computer Science
NLP is -compared to other domains, such as biology -a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. ...
Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. ...
A Test Auto Generator reuses the RDFS and OWL modelling of a knowledge base to verify data quality. In particular, a TAG, based on a DQTP, takes a schema as input and returns test cases. ...
doi:10.1007/978-3-319-07443-6_16
fatcat:vavqfcbgczdvnaw3h4y4ez6c2i
An Integrated Set of Web Mining Tools for Research
2018
Zenodo
Validation uses model-based data extraction to validate the data correctness. ...
IE will use a hybrid extraction way to select portions from a web page and give data into a database. Generalization will clean data and use database techniques to analyze collected data. ...
The difference between wrapper induction tools and those with NLP is that they format features to implicitly delineate the structure of data, while NLP relies on linguistic constraints. ...
doi:10.5281/zenodo.1410994
fatcat:fgeuwa7mlnfbnhmlewdkrmh3x4
An Empirical Assessment of Semantic Interpretation
2000
Applied Natural Language Processing Conference
We introduce a framework for semantic interpretation in which dependency structures are mapped to conceptual representations based on a parsimonious set of interpretation schemata. ...
Our focus is on the empirical evaluation of this approach to semantic interpretation, i.e., its quality in terms of recall and precision. ...
Whether these are relevant or not for a particular application has to be determined by subsequent data/knowledge cleansing. ...
dblp:conf/anlp/RomackerH00
fatcat:xh2uzkjnendebbdf555ngjtsom
Linked Open Data Validity -- A Technical Report from ISWS 2018
[article]
2019
arXiv
pre-print
Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data ...
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. ...
data extracted with NLP, and reliable textual sources could be processed with NLP techniques to be used as a reference knowledge base to validate Linked Data sets. ...
arXiv:1903.12554v1
fatcat:e25yvjsucvghzol4uo3omflh7i
Inductive Entity Typing Alignment
2014
International Semantic Web Conference
Our inductive data-driven approach recasts the alignment problem as a classification problem. ...
We present experiments on two named entity recognition benchmark datasets, namely the CoNLL2003 newswire dataset and the MSM2013 microposts dataset. ...
For instance, Cupid [7] combines a number of techniques such as linguistic matching, structure-based matching, constraint-based matching, and context-based matching at the schema element level and related ...
dblp:conf/semweb/RizzoET14
fatcat:e7s7s7gehrctpeo7i3xz56lw3e
Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale
2016
Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16
The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. ...
In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. ...
., general purpose (BASE), information extraction (IE), web analytics (WA), and data cleansing (DC). ...
doi:10.1145/2882903.2903736
dblp:conf/sigmod/RheinlanderLKML16
fatcat:i745iiaicjan3lnfyybachvahy
A Semi-Automated System for Recognizing Prior Knowledge
2015
International Journal of Emerging Technologies in Learning (iJET)
In addition, the recognition of external courses is a process that all institutions, on-site and online learning organization, must perform during the access of new students, since it can be greatly useful ...
These files contain the teaching plans previously scanned in plain-text format. (4) Data cleansing: possible misinterpretations are corrected by using dictionaries and edit distance algorithms that calculate ...
ONTOLOGY MODEL This section describes the ontology model proposed for the semi-automated system. Note that the design has been performed from scratch based on the necessary knowledge to be stored. ...
doi:10.3991/ijet.v10i7.4610
fatcat:gwhkcw63k5citd4jvvgxf3u4zi
Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features
2019
KSII Transactions on Internet and Information Systems
In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they ...
Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences. ...
Sentence Analysis In data mining applications, data cleansing is essential to improve the quality of the final result. ...
doi:10.3837/tiis.2019.03.030
fatcat:xsskqcplevb5bjpjohdzdja5vy
Relational Learning Analysis of Social Politics using Knowledge Graph Embedding
[article]
2020
arXiv
pre-print
This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. ...
This paper presents a novel credibility domain-based KG Embedding framework. ...
Natural Language Processing (NLP) and linguistic approaches as well as other statistical techniques [22] . ...
arXiv:2006.01626v1
fatcat:235ajzglfvdl7pdluysk6d2nua
A Spell Checking Web Service API for Smart City Communication Platforms
2019
Open Journal of Applied Sciences
on the semantic structure of the specific textual data. ...
providing a Spell Checking Web Service API for Smart City communication platforms able to automatically select, among the large availability of open source spell checking tools, the most suitable tool based ...
NLP data quality assessment has become an important need for NLP datasets (NLP Data Cleansing Based on Linguistic Ontology Constraints). ...
doi:10.4236/ojapps.2019.912066
fatcat:syqnosawpfh5fbq4qazepaoz2m
PhenoMiner: from text to a database of phenotypes associated with OMIM diseases
2015
Database: The Journal of Biological Databases and Curation
The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can ...
Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotypedisorder ...
with it based on NCBO Annotator data (18) . ...
doi:10.1093/database/bav104
pmid:26507285
pmcid:PMC4622021
fatcat:27zs4ue3tzbtvi67db2ukl54yq
Using Data Crawlers and Semantic Web to Build Financial XBRL Data Generators: The SONAR Extension Approach
2014
The Scientific World Journal
Since the Web has become the most significant data source, intelligent crawlers based on Semantic Technologies have become trailblazers in the search of knowledge combining natural language processing ...
and ontology engineering techniques. ...
To resolve this issue, a constraint is imposed on the design of the ontology. ...
doi:10.1155/2014/506740
pmid:24587726
pmcid:PMC3920815
fatcat:x7nbgrpb3zb25f5esbt42csmee
Pairing Conceptual Modeling with Machine Learning
[article]
2021
arXiv
pre-print
With the increasing emphasis on digitizing and processing large amounts of data for business and other applications, it would be helpful to consider how these areas of research can complement each other ...
We then examine how conceptual modeling can be applied to machine learning and propose a framework for incorporating conceptual modeling into data science projects. ...
Acknowledgements This paper was based on a keynote presentation given by the first author at the International Conference on Conceptual Modeling. ...
arXiv:2106.14251v1
fatcat:n4kujuzttja67jqjs3vz3bdiba
Towards the 5th Industrial Revolution: A literature review and a framework for Process Optimization Based on Big Data Analytics and Semantics
2021
Journal of Machine Engineering
based on the integration of semantics. ...
Consequently, albeit engineers currently can monitor the factory level, optimization is cut off of the data acquisition, and is based on data related methodologies. ...
In this scenario, NLP techniques integrate statistical and linguistic techniques with graph-based AI. ...
doi:10.36897/jme/141834
fatcat:tsp4zeziebg5hcxwguwx6dgsgi
Identifying Food-related Word Association and Topic Model Processing using LDA
2018
Journal of Library and Information Studies
The empirical results were analyzed on two levels: (1) by the expert word association classification: taxonomic and script proposed by Ross and Murphy (1999); (2) followed by the associative hierarchy ...
To test the similarity of the output of the topic model and human word association, the "Time-limited Multiple Divergent Thinking Test of Word Associative Strategy" (TLM-DTTWAS) was used to collect data ...
Data cleansing: Filtration was carried out
before processing to remove invalid data. ...
doi:10.6182/jlis.201806_16(1).023
doaj:210ef6d9a5304323b1ed458e4cc0e2fa
fatcat:ddlnji2yfjaulcywf2gljwrmie
« Previous
Showing results 1 — 15 out of 103 results