11,829 Hits in 3.4 sec

Knowledge Translation: Extended Technical Report [article]

Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, Fatemeh Nargesian
2020 arXiv   pre-print
Also, for each node, nj, j ≥ 0, we will express each of concept(nj)'s attributes q, using statements like ?vj q ?attr j , where attrj is var(nj, q), and vj is var(nj).  ...  We say i has higher Path priority than j if it has a lower cost. Example 3.3.  ... 
arXiv:2008.01208v1 fatcat:uuj7gmk6e5bw5mbxzvhtfeqela


Periklis Andritsos, Ariel Fuxman, Anastasios Kementsietsidis, Renée J. Miller, Yannis Velegrakis
2004 SIGMOD record  
In Toronto's Kanata project, we are investigating the integration and exchange of data and metadata in dynamic, autonomous environments. Our focus is on the development and maintenance of semantic mappings that permit runtime sharing of information.
doi:10.1145/1041410.1041416 fatcat:bnqnuqwnwrdbtgbszs3jc5ojki

Open data integration

Renée J. Miller
2018 Proceedings of the VLDB Endowment  
Open data plays a major role in supporting both governmental and organizational transparency. Many organizations are adopting Open called query discovery where the main task is to discover a query (or transformation) that translates data from one form into another. The goal is to find the right operators to join, nest, group, link, and twist data into a desired form. We introduce a new paradigm for thinking about integration where the focus is on data discovery, but highly efficient
more » ... ficient internet-scale discovery that is driven by data analysis needs. We describe a research agenda and recent progress in developing scalable data-analysis or query-aware data discovery algorithms that provide high recall and accuracy over massive data repositories.
doi:10.14778/3229863.3240491 fatcat:i7hcebup6bhrrjidunk4zh7shq

A Collective, Probabilistic Approach to Schema Mapping: Appendix [article]

Angelika Kimmig, Alex Memory, Renee J. Miller, Lise Getoor
2017 arXiv   pre-print
F (M) =w 1 · t∈J [1 − explains full (M, t)] + w 2 · t∈KC−J [error full (M, t)] + w 3 · θ∈M size(θ)  ...  ., a non-certain error tuple if deleted from J), or only by C−M G (i.e., a non-certain unexplained tuple if added to J).  ... 
arXiv:1702.03447v1 fatcat:zmdaoqfzyzfkddljpiaik2346e

Microarrays in Parkinson's disease: A systematic approach

Renee M. Miller, Howard J. Federoff
2006 NeuroRx  
Our lab has recently examined gene expression in three different transgenic mouse lines (Miller et al., unpublished data) .  ...  Significantly differentially expressed genes were identified and validated using quantitative RT-PCR (Miller and Federoff, unpublished data).  ... 
doi:10.1016/j.nurx.2006.05.008 pmid:16815215 pmcid:PMC3593377 fatcat:u3xm3rshqrgh3epddcg4gwyr7e

Microarrays in Parkinson's disease: A systematic approach

Renee M. Miller, Howard J. Federoff
2006 Neurotherapeutics  
Our lab has recently examined gene expression in three different transgenic mouse lines (Miller et al., unpublished data) .  ...  Significantly differentially expressed genes were identified and validated using quantitative RT-PCR (Miller and Federoff, unpublished data).  ... 
doi:10.1007/bf03206655 fatcat:h7e5ewf7bbbhtm5ctyk4sdo6cu

The hyperion project

Marcelo Arenas, Vasiliki Kantere, Anastasios Kementsietsidis, Iluju Kiringa, Renée J. Miller, John Mylopoulos
2003 SIGMOD record  
We present an architecture and a set of challenges for peer database management systems. These systems team up to build a network of nodes (peers) that coordinate at run time most of the typical DBMS tasks such as the querying, updating, and sharing of data. Such a network works in a way similar to conventional multidatabases. Conventional multidatabase systems are founded on key concepts such as those of a global schema, central administrative authority, data integration, global access to
more » ... obal access to multiple databases, permanent participation of databases, etc. Instead, our proposal assumes total absence of any central authority or control, no global schema, transient participation of peer databases, and constantly evolving coordination rules among databases. In this work, we describe the status of the Hyperion project, present our current solutions, and outline remaining research issues.
doi:10.1145/945721.945733 fatcat:o5lzhvlqxvcd3j5m6vgznhcksy

Discovering data quality rules

Fei Chiang, Renée J. Miller
2008 Proceedings of the VLDB Endowment  
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code. In this work, we propose a new data-driven tool that can be used within an organization's data quality management process to suggest possible rules, and to
more » ... ules, and to identify conformant and non-conformant records. Data quality rules are known to be contextual, so we focus on the discovery of context-dependent rules. Specifically, we search for conditional functional dependencies (CFDs), that is, functional dependencies that hold only over a portion of the data. The output of our tool is a set of functional dependencies together with the context in which they hold (for example, a rule that states for CS graduate courses, the course number and term functionally determines the room and instructor). Since the input to our tool will likely be a dirty database, we also search for CFDs that almost hold. We return these rules together with the non-conformant records (as these are potentially dirty records). We present effective algorithms for discovering CFDs and dirty values in a data instance. Our discovery algorithm searches for minimal CFDs among the data values and prunes redundant candidates. No universal objective measures of data quality or data quality rules are known. Hence, to avoid returning an unnecessarily large number of CFDs and only those that are most interesting, we evaluate a set of interest metrics and present comparative results using real datasets. We also present an experimental study showing the scalability of our techniques.
doi:10.14778/1453856.1453980 fatcat:kqsmykm3nffxzbo4x224cfc3bi


Boris Glavic, Gustavo Alonso, Renée J. Miller, Laura M. Haas
2010 Proceedings of the VLDB Endowment  
For a given instance I of the source schema, an instance J of the target schema is called a solution of a schema mapping M if J satisfies all the constraints in Σt and I, J satisfy all the constraints  ...  Extracts dates from techreport self J. Returns all author, co-author combinations Denorm. Merges dates and publications Aggr.  ... 
doi:10.14778/1920841.1921003 fatcat:743tpdx62rh77hznvtyprrdyae

LinkedCT: A Linked Data Space for Clinical Trials [article]

Oktie Hassanzadeh, Anastasios Kementsietsidis, Lipyeow Lim, Renee J. Miller, Min Wang
2009 arXiv   pre-print
The Linked Clinical Trials (LinkedCT) project aims at publishing the first open semantic web data source for clinical trials data. The database exposed by LinkedCT is generated by (1) transforming existing data sources of clinical trials into RDF, and (2) discovering semantic links between the records in the trials data and several other data sources. In this paper, we discuss several challenges involved in these two steps and present the methodology used in LinkedCT to overcome these
more » ... ome these challenges. Our approach for semantic link discovery involves using state-of-the-art approximate string matching techniques combined with ontology-based semantic matching of the records, all performed in a declarative and easy-to-use framework. We present an evaluation of the performance of our proposed techniques in several link discovery scenarios in LinkedCT.
arXiv:0908.0567v1 fatcat:i5if25dtvbfwbnrecypdg2ny7y

Creating probabilistic databases from duplicated data

Oktie Hassanzadeh, Renée J. Miller
2009 The VLDB journal  
of r i and r j are in f (c l ), p t∈c (r i ) ≤ p t∈c (r j ) and r i appears before r j in L, i.e., |(r i , r j )|r i , r j ∈ L, i ≤ j, p t (r i ∈ c l ) ≤ p t (r j ∈ c l )| − e k 2 − e e = |(r i , r j  ...  = ∅ for all i, j.  ...  ., p a (r i ) ≤ p a (r j ) iff i ≤ j where p a (r) is the probability value assigned to the record r.  ... 
doi:10.1007/s00778-009-0161-2 fatcat:zdqo7b5lrjdudmzhd255i2nx3q

"You ought to write. You need to probe the heart of life": art dealer and diarist René Gimpel and the interwar transatlantic art trade (1918-1939) . Review of: The Journal of a Transatlantic Art Dealer. René Gimpel, 1918-1939, by Diana J. Kostyrko, London, Turnhout: Harvey Miller/ Brepols Publishers, 2017

Marie Tavinor
2020 Journal of Art Historiography  
Using the diary as a tool, albeit acknowledging its sometimes problematic nature as literary object, Diana J.  ...  A prominent art dealer operating between Paris and New York during the interwar years, René Gimpel (1881-1945) kept a diary which was first published in 1963 and then republished in an extended version  ...  René's father Ernest Gimpel Diana J. Kostyrko, The Journal of a Transatlantic Art Dealer.  ... 
doaj:d191b4f3b2684141823776df32f814fe fatcat:oluezhjcj5bkjfjptmbz574iiq

Mapping Adaptation under Evolving Schemas [chapter]

Yannis Velegrakis, Renée J. Miller, Lucian Popa
2003 Proceedings 2003 VLDB Conference  
n.sponsor→private, c in S.contacts o in S.companies, e in S.persons where p.project.source=g.grant.gid and and and exists j  ...  For each pair <A S i , A T j > a new mapping m ij of the form foreach A S i exists A T j with D is created, where D includes the conditions D, plus the conditions of all the correspondences that are covered  ... 
doi:10.1016/b978-012722442-8/50058-6 dblp:conf/vldb/VelegrakisMP03 fatcat:yhvo3ywn4baupcxaciaeu7bxbe

LSH ensemble

Erkang Zhu, Fatemeh Nargesian, Ken Q. Pu, Renée J. Miller
2016 Proceedings of the VLDB Endowment  
We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment score, defined as |Q ∩ X|/|Q|, as the measure of relevance of a domain X to a query domain Q. Our choice of Jaccard set containment over Jaccard similarity as a measure of relevance makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in
more » ... ge differences in their domain sizes. We demonstrate that the domains found in several real-life Open Data and web data repositories show a power-law distribution over their domain sizes. We present a new index structure, Locality Sensitive Hashing (LSH) Ensemble, that solves the domain search problem using set containment at Internet scale. Our index structure and search algorithm cope with the data volume and skew by means of data sketches using Minwise Hashing and domain partitioning. Our index structure does not assume a prescribed set of data values. We construct a cost model that describes the accuracy of LSH Ensemble with any given partitioning. This allows us to formulate the data partitioning for LSH Ensemble as an optimization problem. We prove that there exists an optimal partitioning for any data distribution. Furthermore, for datasets following a powerlaw distribution, as observed in Open Data and Web data corpora, we show that the optimal partitioning can be approximated using equi-depth, making it particularly efficient to use in practice. We evaluate our algorithm using real data (Canadian Open Data and WDC Web Tables) containing up over 262 million domains. The experiments demonstrate that our index consistently outperforms other leading alternatives in accuracy and performance. The improvements are most dramatic for data with large skew in the domain sizes. Even at 262 million domains, our index sustains query performance with under 3 seconds response time.
doi:10.14778/2994509.2994534 fatcat:53svufrhifcbhabiobvkfgaz5q


Ariel Fuxman, Elham Fazli, Renée J. Miller
2005 Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05  
Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. In this paper, we present Con-Quer, a system for efficient and scalable answering of SQL queries on databases that may violate a set of constraints. ConQuer permits users to postulate a set of key constraints together with their queries. The system rewrites the queries to retrieve all (and only) data that is consistent with respect to the
more » ... ect to the constraints. The rewriting is into SQL, so the rewritten queries can be efficiently optimized and executed by commercial database systems. We study the overhead of resolving inconsistencies dynamically (at query time). In particular, we present a set of performance experiments that compare the efficiency of the rewriting strategies used by ConQuer. The experiments use queries taken from the TPC-H workload. We show that the overhead is not onerous, and the consistent query answers can often be computed within twice the time required to obtain the answers to the original (non-rewritten) query.
doi:10.1145/1066157.1066176 dblp:conf/sigmod/FuxmanFM05 fatcat:lbugnrr6l5cw5eo3nok6wckxku
« Previous Showing results 1 — 15 out of 11,829 results