111,070 Hits in 5.8 sec

Data integration using similarity joins and a word-based information representation language

William W. Cohen
2000 ACM Transactions on Information Systems  
We describe WHIRL, a "soft" database management system which supports "similarity joins," based on certain robust, general-purpose similarity metrics for text.  ...  A solution to this problem is proposed for databases that contain informal, natural-language "names" for objects; most Web-based databases satisfy this requirement, since they usually present their information  ...  ACKNOWLEDGMENTS The author is grateful to Alon Levy for numerous helpful discussions while I was formulating this problem, and for comments on a draft of the paper; to Jaewoo Kang and Sheila Tejada, for  ... 
doi:10.1145/352595.352598 fatcat:ixdwdbhzrfcddn4ipb7kt5re4e

Part-of-Speech Tagging Using Multiview Learning

KyungTae Lim, Jungyeul Park
2020 IEEE Access  
[25] also uses two embeddings in a similar way, we use the word-based character embedding instead of the simple word embedding.  ...  subword information. • LM representation: A deep contextualized representation that integrates richly supplied contexts gathered from external resources (raw texts) by using ELMo [8] and BERT [10]  ... 
doi:10.1109/access.2020.3033979 fatcat:lfso5kfveffk5ncuneitaigwkq

Model-driven development of Web APIs to access integrated tabular open data

Cesar Gonzalez-Mora, David Tomas, Irene Garrigos, Jose Jacobo Zubcoff, Jose-Norberto Mazon
2020 IEEE Access  
Our APIfication approach has two parts: (i) a word embeddings-based approach that uses column similarity to determine which datasets can be integrated by using union and join operators; and (ii) a model-driven  ...  WORD EMBEDDINGS AND TABULAR DATA INTEGRATION In recent years, word embeddings have enjoyed widespread use in a variety of semantic tasks in the field of Natural Language Processing, such as sentiment analysis  ... 
doi:10.1109/access.2020.3036462 fatcat:vlsd3taii5el7eynucq7qefoii

Vector Space Models for Phrase-based Machine Translation

Tamer Alkhouli, Andreas Guta, Hermann Ney
2014 Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation  
VSMs are models based on continuous word representations embedded in a vector space. We exploit word vectors to augment the phrase table with new inferred phrase pairs.  ...  The phrase vectors are then used to provide additional scoring of phrase pairs, which fits into the standard log-linear framework of phrase-based statistical machine translation.  ...  Acknowledgments This material is partially based upon work supported by the DARPA BOLT project under Contract No. HR0011-12-C-0015.  ... 
doi:10.3115/v1/w14-4001 dblp:conf/ssst/AlkhouliGN14 fatcat:dp2h4ist5vg2ho2p3mei2m2tpm

Text Document Clustering: Wordnet vs. TF-IDF vs. Word Embeddings

Michal Marcinczuk, Mateusz Gniewkowski, Tomasz Walkowiak, Marcin Bedkowski
2021 Global WordNet Conference  
Our goal is to compare the modern approaches based on language modeling (doc2vec and BERT) with the classical ones, i.e., TF-IDF and wordnet-based.  ...  The experiments' results showed that wordnet-based similarity measures could compete and even outperform modern embedding-based approaches.  ...  Acknowledgements The paper was prepared as part of the project "Operating and Developing of the Integrated Register of Qualifications" implemented by the Educational Research Institute as commissioned  ... 
dblp:conf/wordnet/MarcinczukGWB21 fatcat:4tsq2v7tqnbmngje7mz2257ixi

Integrating relational database schemas using a standardized dictionary

Ramon Lawrence, Ken Barker
2001 Proceedings of the 2001 ACM symposium on Applied computing - SAC '01  
We propose that integration can be increasingly automated by capturing data semantics using a standardized dictionary.  ...  The architecture automatically integrates and transparently queries relational data sources, and its application of standardization to the integration problem is unique.  ...  TSIMMIS [7] and Infomaster [2] construct integrated views using designer-based approaches which are mapped using a query language or logical rules into views or queries on the individual data sources  ... 
doi:10.1145/372202.372327 dblp:conf/sac/LawrenceB01 fatcat:gn7r7szgbnajvcbxpchnedwdwa

DTL's DataSpot: Database Exploration Using Plain Language

Shaul Dar, Gadi Entin, Shai Geva, Eran Palmon
1998 Very Large Data Bases Conference  
DataSpot is based on a novel representation of data in the form of a schema-less semi-structured graph called a hyperbase.  ...  DTL's DataSpot is a database publishing tool that enables non-technical end users to explore a database using free-form plain language queries combined with hypertext navigation.  ...  We are also grateful to Divesh Srivastava and S. Sudarshan for providing us with valuable feedback on this paper.  ... 
dblp:conf/vldb/DarEGP98 fatcat:kqwlzcbenngi3oktfeguummndq

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning [article]

Zijie Wang, Lixi Zhou, Amitabh Das, Valay Dave, Zhanpeng Jin, Jia Zou
2020 arXiv   pre-print
In this work, we propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data to make the model  ...  Data is the king in the age of AI. However data integration is often a laborious task that is hard to automate.  ...  We are mainly targeting at open data in CSV, JSON and text format and choose to use a super cell based representation.  ... 
arXiv:2010.07586v1 fatcat:kaux2y3uvzfivon7cusggsyudm

Adding Semantics to Business Intelligence: Towards a Smarter Generation of Analytical Tools [chapter]

Denilson Sell, Dhiogo Cardoso da Silva, Fernando Benedet, Mrcio Napoli, Jos Leomar
2012 Business Intelligence - Solution for Business Development  
All syntactic patterns and heuristics along with a list of stop-words used by OLAP translator must be configured according to the language in the knowledge base.  ...  For instance, an inverted index structure is used just to find the papers ids produced by students, and these ids are used to join with other information about such students stored in a data mart.  ...  data using XML, densitybased clustering and anomaly detection, data mining based on neural networks.  ... 
doi:10.5772/35572 fatcat:pcy3f2ymyjclrkwx2ficgosvri

Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks [article]

Parisa Kordjamshidi, Sameer Singh, Daniel Khashabi, Christos Christodoulopoulos, Mark Summons, Saurabh Sinha, Dan Roth
2017 arXiv   pre-print
Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to  ...  In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in  ...  This material is based on research sponsored by DARPA un-der agreement number FA8750-13-2-0008. The U.S.  ... 
arXiv:1707.07794v1 fatcat:puoapsp2vjehri27newnvtcrdi

Termite: A System for Tunneling Through Heterogeneous Data [article]

Raul Castro Fernandez, Samuel Madden
2019 arXiv   pre-print
On top of Termite, we have implemented a Termite-Join operator, which allows people to identify related concepts, even when these are stored in databases with different schemas and in unstructured data  ...  Because the best representation is learned, this allows Termite to avoid much of the human effort associated with traditional data integration tasks.  ...  In [2] , the authors propose a method to learn a vector representation of data items from relational data based on word embeddings [17] , and then use those vectors to augment traditional SQL queries  ... 
arXiv:1903.05008v1 fatcat:2nukbx3y3vh5nlw72ppl3l27s4

Network representation learning method embedding linear and nonlinear network structures

Hu Zhang, Jingjing Zhou, Ru Li, Yue Fan, Mehwish Alam, Davide Buscaldi, Michael Cochez, Francesco Osborne, Diego Reforgiato Recupero, Harald Sack
2022 Semantic Web Journal  
Thus, the nodal characteristics of nonlinear and linear structures are explored in this paper, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is  ...  to effectively apply learned network representations to various graph-based analytical tasks.  ...  The DeepWalk [16] method uses a DFS-based random walk and uses second-order similarity to capture local community information, which can be used to observe the entire network in the absence of structural  ... 
doi:10.3233/sw-212968 fatcat:trmtlpbqp5dvha7mniks3yqn44

Ontology-based Data Integration

Manuk Manukyan
2019 International Conference on Data Analytics and Management in Data Intensive Domains  
It is important that the considered data model is extensible and we use a computationally complete language to support the data integration concept.  ...  The reasoning rules are based on an algebra of integrable data and formalized by an XML DTD. The data translation mechanisms are non-sensitive to extension of the considered algebra.  ...  APPENDIX A. An XML DTD for Modeling the Reasoning Rules of the Data Integration Concept  ... 
dblp:conf/rcdl/Manukyan19 fatcat:maxvvycnhrd47diowotkrtpmiq


Mirella M. Moro, Vanessa Braganholo, Carina F. Dorneles, Denio Duarte, Renata Galante, Ronaldo S. Mello
2009 SIGMOD record  
Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema evolution. It then summarizes some (some!)  ...  XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML.  ...  This work was partially supported by CNPq, CAPES, FAPERJ, and FAPEMIG, Brazil.  ... 
doi:10.1145/1815918.1815924 fatcat:td2eaynvwjcg5lwsqacix5nsa4

Querying Relational Databases without Explicit Joins [chapter]

Ramon Lawrence, Ken Barker
2002 Lecture Notes in Computer Science  
Database semantics are described using a global dictionary and semantic specifications that are combined to form an integrated, context view.  ...  Despite its benefits and wide-spread acceptance, SQL [5] is not a perfect query language.  ...  and appropriate join conditions.  ... 
doi:10.1007/3-540-46140-x_22 fatcat:twae5vsmnnclbey3z5am5vagxm
« Previous Showing results 1 — 15 out of 111,070 results