16,772 Hits in 7.4 sec

Combining schema and instance information for integrating heterogeneous data sources

Huimin Zhao, Sudha Ram
2007 Data & Knowledge Engineering  
Determining the correspondences among heterogeneous data sources, which is critical to integration of the data sources, is a complex and resource-consuming task that demands automated support.  ...  We have performed empirical evaluation using real-world heterogeneous data sources and report in this paper some promising results (i.e., incremental improvement in identified correspondences) that demonstrate  ...  Acknowledgements The authors are grateful to the anonymous referees for their valuable suggestions.  ... 
doi:10.1016/j.datak.2006.06.004 fatcat:l4yux53lfbh33nzomjosptjuya

Entity Identification in XML Documents

Leonardo Ribeiro, Theo Härder
2006 Workshop Grundlagen von Datenbanken  
As a natural result of the dissemination of a large variety of XML databases, the well-known problem of data integration must be faced from the XML viewpoint One of the basic functions of an integration  ...  system is the record linkage, the task of comparing records to determine those that are differently represented, but relate to the same entity.  ...  The most common application of record linkage techniques is discovering duplicates within a unique database during data cleaning procedures or identifying overlapping entities across multiple databases  ... 
dblp:conf/gvd/RibeiroH06 fatcat:qppcemygabgzllwyhgb52yrjfy

An integrated framework for de-identifying unstructured medical data

James Gardner, Li Xiong
2009 Data & Knowledge Engineering  
We empirically study a simple Bayesian classifier, a Bayesian classifier with a sampling based technique, and a conditional random field based classifier for extracting identifying attributes from unstructured  ...  We present a set of preliminary evaluations showing the effectiveness of our approach.  ...  We thank the guest editors and anonymous reviewers for their valuable comments that improved this paper.  ... 
doi:10.1016/j.datak.2009.07.006 fatcat:4eq3oldagrfmjnddkylluuhliy

Data Integration and Record Matching: An Austrian Contribution to Research in Official Statistics

Michaela Denk, Peter Hackl
2016 Austrian Journal of Statistics  
<p align="LEFT">Data integration techniques are one of the core elements of DIECOFIS, an EU-funded international research project that aims at developing a methodology for the construction of a system  ...  Data integration is also of major interest for official statistics agencies as a means of using available information more efficiently and improving the quality of the agency's products.  ...  Technological heterogeneity encompasses differences in hardware, operating systems, and in database management systems.  ... 
doi:10.17713/ajs.v32i4.464 fatcat:jiuvvra245etxicpohsyj3b7iq

Entity reconciliation in big data sources: A systematic mapping study

J.G. Enríquez, F.J. Domínguez-Mayo, M.J. Escalona, M. Ross, G. Staples
2017 Expert systems with applications  
The entity reconciliation (ER) problem aroused much interest as a research topic in today's Big Data era, full of big and open heterogeneous data sources.  ...  Besides, the complexity that the heterogeneity of data sources involves, the large number of records and differences among languages, for instance, must be added.  ...  Competitiveness and Fujitsu Laboratories of Europe (FLE).  ... 
doi:10.1016/j.eswa.2017.03.010 fatcat:5laaclfscvh3ld7niksbxrgorm

Finding human gene-disease associations using a Network Enhanced Similarity Search (NESS) of multi-species heterogeneous functional genomics data [article]

Timothy Reynolds, Jason A Bubier, Michael A Langston, Elissa J Chesler, Erich J Baker
2020 bioRxiv   pre-print
The evaluation of large-scale genomic experimental datasets is a compelling approach to refining the classification of biological concepts, such as disease.  ...  NESS employs a random walk with restart algorithm across harmonized multi-species data, effectively compensating for sparsely populated and noisy genomic studies.  ...  Acknowledgments 430 The authors thank Stephen Krasinski for his helpful review and comments. 431 Supporting information  ... 
doi:10.1101/2020.03.11.987552 fatcat:cxiva6iliffergyrhnltvzrg24

Polyflow: a Polystore-compliant Mechanism to Provide Interoperability to Heterogeneous Provenance Graphs

Yan Mendes, Daniel De Oliveira, Victor Ströele
2021 Journal of Information and Data Management  
With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting  ...  These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels.  ...  This extended version provides new empirical shreds of evidence regarding the proposed approach, an evaluation of the approach with experts and a broader discussion on related work.  ... 
doi:10.5753/jidm.2020.2017 fatcat:r4evleyx7vaobixao7lc3ejmke

Exploring Attribute Correspondences Across Heterogeneous Databases by Mutual Information

2006 Journal of Management Information Systems  
Entity identification for heterogeneous database integrationA multiple classifier system approach and empirical evaluation. /nformation Systems, 30, 2 (2005), 119-132.  ...  University Property Example WE WILL USE A REAL-WORLD CASE of heterogeneous databases for both illustrative and empirical evaluation purposes.  ... 
doi:10.2753/mis0742-1222220411 fatcat:wh7xwemgt5cgdghmj5ezpvcehe

Using an Emotion-based Model and Sentiment Analysis Techniques to Classify Polarity for Reputation

Jorge Carrillo de Albornoz, Irina Chugur, Enrique Amigó
2012 Conference and Labs of the Evaluation Forum  
The original system has been extended to work with texts in English and in Spanish, and to include a module for filtering tweets according to their relevance to each company.  ...  The experimental results prove that sentiment analysis techniques are a good starting point for creating systems for automatic detection of polarity for reputation.  ...  The MCR is an open source database that integrates WordNet versions for five different languages: English, Spanish, Catalan, Basque and Galician.  ... 
dblp:conf/clef/Carrillo-de-AlbornozCA12 fatcat:u5ltqnnih5bubkls7fpwk6ksga

A Multi-View Learning based Clustering Method for Health Care System

Neha Garg, Sunidhi Shrivastava
2022 International Journal for Research in Applied Science and Engineering Technology  
Second, proposed a Genetic-K-means based clustering algorithm based on Collective Matrix Factorization for heterogeneous clinical records.  ...  Collective Matrix Factorization to combine the extracted features from multiple views and gives a low dimensional representation of combined clinical data.  ...  Text Preprocessing and Feature Extraction for an extrapolative system Fig. 2 . A general approach for building a text clustering model using heterogeneous clinical notes.  ... 
doi:10.22214/ijraset.2022.43243 fatcat:ukxh2gdz6belxlj4fkskztbefe

A knowledge-based system for patient image pre-fetching in heterogeneous database environments - modeling, design, and evaluation

Chih-Ping Wei, Paul Jen-Hwa Hu, O.R.L. Sheng
2001 IEEE Transactions on Information Technology in Biomedicine  
Moreover, the system demands an extensible and maintainable architecture design capable of effectively adapting to a dynamic environment characterized by heterogeneous and autonomous data source systems  ...  In this paper, we developed a synthesized object-oriented entity-relationship model, a conceptual model appropriate for representing radiologists' prior image reference heuristics that are heuristic oriented  ...  ACKNOWLEDGMENT The authors wish to thank radiologists at the UMC, University of Arizona, Tucson, for their cooperation during our requirement analysis and case collection.  ... 
doi:10.1109/4233.908382 pmid:11300215 fatcat:zauw4h7ot5advatamf73ursuua

GeneWeaver: finding consilience in heterogeneous cross-species functional genomics data

Jason A. Bubier, Charles A. Phillips, Michael A. Langston, Erich J. Baker, Elissa J. Chesler
2015 Mammalian Genome  
The system provides a platform for cross-species integration and interrogation of heterogeneous curated and experimentally derived functional genomics data.  ...  substrates of related diseases, classifying experiments and the biological concepts they represent from empirical data, and applying patterns of genomic evidence to implicate novel genes in disease.  ...  Acknowledgments The original development of the GeneWeaver system was supported by U01AA13499 and U24AA13513. It is currently supported by NIH R01 AA18776, jointly funded by NIAAA and NIDA.  ... 
doi:10.1007/s00335-015-9575-x pmid:26092690 pmcid:PMC4602068 fatcat:ig3gxodwzjezzevoyqy2dwsrvu

Exploiting semantic structure for mapping user-specified form terms to SNOMED CT concepts

Ritu Khare, Yuan An, Jiexun Li, Il-Yeol Song, Xiaohua Hu
2012 Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI '12  
This term diversity makes future database integration and analysis a huge challenge.  ...  For a given form term, this approach (i) exploits the semantic structure of the form to derive the term's context, and (ii) maps the term to a linguistically-matching SNOMED CT concept that is compatible  ...  ACKNOWLEDGEMENTS We sincerely thank the anonymous reviewers for providing valuable suggestions for revising this paper.  ... 
doi:10.1145/2110363.2110397 dblp:conf/ihi/KhareALSH12 fatcat:c7ped2bdd5bp7a2pwoesa3d5cm

Challenges in integrating Escherichia coli molecular biology data

A. Lourenco, S. Carneiro, M. Rocha, E. C. Ferreira, I. Rocha
2010 Briefings in Bioinformatics  
One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements.  ...  Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents.  ...  Acknowledgements Both A.L. and S.C. have made equal substantive contributions to this manuscript. A.L. and S.C. designed the experiments and defined the integration strategies to be studied.  ... 
doi:10.1093/bib/bbq067 pmid:21059604 fatcat:zvkzcvojgrevtc4ehbof6tz5ty

Instance-based attribute identification in database integration

Cecil Eng H. Chua, Roger H. L. Chiang, Ee-Peng Lim
2003 The VLDB journal  
Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values.  ...  Unlike other attribute identification methods that match only single attributes, our method matches attribute groups for integration.  ...  and instances from heterogeneous databases 8 Schema integration The process of matching and integrating schema elements from different heterogeneous databases Attribute identification Attribute identification  ... 
doi:10.1007/s00778-003-0088-y fatcat:ofqncr2irnazllmmkks4tc4f4u
« Previous Showing results 1 — 15 out of 16,772 results