A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Data integration using similarity joins and a word-based information representation language
2000
ACM Transactions on Information Systems
We describe WHIRL, a "soft" database management system which supports "similarity joins," based on certain robust, general-purpose similarity metrics for text. ...
A solution to this problem is proposed for databases that contain informal, natural-language "names" for objects; most Web-based databases satisfy this requirement, since they usually present their information ...
ACKNOWLEDGMENTS The author is grateful to Alon Levy for numerous helpful discussions while I was formulating this problem, and for comments on a draft of the paper; to Jaewoo Kang and Sheila Tejada, for ...
doi:10.1145/352595.352598
fatcat:ixdwdbhzrfcddn4ipb7kt5re4e
Part-of-Speech Tagging Using Multiview Learning
2020
IEEE Access
[25] also uses two embeddings in a similar way, we use the word-based character embedding instead of the simple word embedding. ...
subword information. • LM representation: A deep contextualized representation that integrates richly supplied contexts gathered from external resources (raw texts) by using ELMo [8] and BERT [10] ...
doi:10.1109/access.2020.3033979
fatcat:lfso5kfveffk5ncuneitaigwkq
Model-driven development of Web APIs to access integrated tabular open data
2020
IEEE Access
Our APIfication approach has two parts: (i) a word embeddings-based approach that uses column similarity to determine which datasets can be integrated by using union and join operators; and (ii) a model-driven ...
WORD EMBEDDINGS AND TABULAR DATA INTEGRATION In recent years, word embeddings have enjoyed widespread use in a variety of semantic tasks in the field of Natural Language Processing, such as sentiment analysis ...
doi:10.1109/access.2020.3036462
fatcat:vlsd3taii5el7eynucq7qefoii
Vector Space Models for Phrase-based Machine Translation
2014
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
VSMs are models based on continuous word representations embedded in a vector space. We exploit word vectors to augment the phrase table with new inferred phrase pairs. ...
The phrase vectors are then used to provide additional scoring of phrase pairs, which fits into the standard log-linear framework of phrase-based statistical machine translation. ...
Acknowledgments This material is partially based upon work supported by the DARPA BOLT project under Contract No. HR0011-12-C-0015. ...
doi:10.3115/v1/w14-4001
dblp:conf/ssst/AlkhouliGN14
fatcat:dp2h4ist5vg2ho2p3mei2m2tpm
Text Document Clustering: Wordnet vs. TF-IDF vs. Word Embeddings
2021
Global WordNet Conference
Our goal is to compare the modern approaches based on language modeling (doc2vec and BERT) with the classical ones, i.e., TF-IDF and wordnet-based. ...
The experiments' results showed that wordnet-based similarity measures could compete and even outperform modern embedding-based approaches. ...
Acknowledgements The paper was prepared as part of the project "Operating and Developing of the Integrated Register of Qualifications" implemented by the Educational Research Institute as commissioned ...
dblp:conf/wordnet/MarcinczukGWB21
fatcat:4tsq2v7tqnbmngje7mz2257ixi
Integrating relational database schemas using a standardized dictionary
2001
Proceedings of the 2001 ACM symposium on Applied computing - SAC '01
We propose that integration can be increasingly automated by capturing data semantics using a standardized dictionary. ...
The architecture automatically integrates and transparently queries relational data sources, and its application of standardization to the integration problem is unique. ...
TSIMMIS [7] and Infomaster [2] construct integrated views using designer-based approaches which are mapped using a query language or logical rules into views or queries on the individual data sources ...
doi:10.1145/372202.372327
dblp:conf/sac/LawrenceB01
fatcat:gn7r7szgbnajvcbxpchnedwdwa
DTL's DataSpot: Database Exploration Using Plain Language
1998
Very Large Data Bases Conference
DataSpot is based on a novel representation of data in the form of a schema-less semi-structured graph called a hyperbase. ...
DTL's DataSpot is a database publishing tool that enables non-technical end users to explore a database using free-form plain language queries combined with hypertext navigation. ...
We are also grateful to Divesh Srivastava and S. Sudarshan for providing us with valuable feedback on this paper. ...
dblp:conf/vldb/DarEGP98
fatcat:kqwlzcbenngi3oktfeguummndq
Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning
[article]
2020
arXiv
pre-print
In this work, we propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data to make the model ...
Data is the king in the age of AI. However data integration is often a laborious task that is hard to automate. ...
We are mainly targeting at open data in CSV, JSON and text format and choose to use a super cell based representation. ...
arXiv:2010.07586v1
fatcat:kaux2y3uvzfivon7cusggsyudm
Adding Semantics to Business Intelligence: Towards a Smarter Generation of Analytical Tools
[chapter]
2012
Business Intelligence - Solution for Business Development
All syntactic patterns and heuristics along with a list of stop-words used by OLAP translator must be configured according to the language in the knowledge base. ...
For instance, an inverted index structure is used just to find the papers ids produced by students, and these ids are used to join with other information about such students stored in a data mart. ...
data using XML, densitybased clustering and anomaly detection, data mining based on neural networks. ...
doi:10.5772/35572
fatcat:pcy3f2ymyjclrkwx2ficgosvri
Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks
[article]
2017
arXiv
pre-print
Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to ...
In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ...
This material is based on research sponsored by DARPA un-der agreement number FA8750-13-2-0008. The U.S. ...
arXiv:1707.07794v1
fatcat:puoapsp2vjehri27newnvtcrdi
Termite: A System for Tunneling Through Heterogeneous Data
[article]
2019
arXiv
pre-print
On top of Termite, we have implemented a Termite-Join operator, which allows people to identify related concepts, even when these are stored in databases with different schemas and in unstructured data ...
Because the best representation is learned, this allows Termite to avoid much of the human effort associated with traditional data integration tasks. ...
In [2] , the authors propose a method to learn a vector representation of data items from relational data based on word embeddings [17] , and then use those vectors to augment traditional SQL queries ...
arXiv:1903.05008v1
fatcat:2nukbx3y3vh5nlw72ppl3l27s4
Network representation learning method embedding linear and nonlinear network structures
2022
Semantic Web Journal
Thus, the nodal characteristics of nonlinear and linear structures are explored in this paper, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is ...
to effectively apply learned network representations to various graph-based analytical tasks. ...
The DeepWalk [16] method uses a DFS-based random walk and uses second-order similarity to capture local community information, which can be used to observe the entire network in the absence of structural ...
doi:10.3233/sw-212968
fatcat:trmtlpbqp5dvha7mniks3yqn44
Ontology-based Data Integration
2019
International Conference on Data Analytics and Management in Data Intensive Domains
It is important that the considered data model is extensible and we use a computationally complete language to support the data integration concept. ...
The reasoning rules are based on an algebra of integrable data and formalized by an XML DTD. The data translation mechanisms are non-sensitive to extension of the considered algebra. ...
APPENDIX A. An XML DTD for Modeling the Reasoning Rules of the Data Integration Concept ...
dblp:conf/rcdl/Manukyan19
fatcat:maxvvycnhrd47diowotkrtpmiq
XML
2009
SIGMOD record
Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution. It then summarizes some (some!) ...
XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML. ...
This work was partially supported by CNPq, CAPES, FAPERJ, and FAPEMIG, Brazil. ...
doi:10.1145/1815918.1815924
fatcat:td2eaynvwjcg5lwsqacix5nsa4
Querying Relational Databases without Explicit Joins
[chapter]
2002
Lecture Notes in Computer Science
Database semantics are described using a global dictionary and semantic specifications that are combined to form an integrated, context view. ...
Despite its benefits and wide-spread acceptance, SQL [5] is not a perfect query language. ...
and appropriate join conditions. ...
doi:10.1007/3-540-46140-x_22
fatcat:twae5vsmnnclbey3z5am5vagxm
« Previous
Showing results 1 — 15 out of 111,070 results