Optimized union of non-disjoint distributed data sets

2009
*
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09
*

In a variety

doi:10.1145/1516360.1516364
dblp:conf/edbt/DarMV09
fatcat:d2nzkx7rzzcsloeui3bh5qaoou
*of*applications, ranging from*data*integration to*distributed*query evaluation, there is a need to obtain*sets**of**data*items from several sources (peers) and compute their*union*. ... A challenge in the design*of**optimal*plans is the lack*of*a complete map*of*the*distribution**of*the*data*items among peers. ...*of**non*-*disjoint**data**sets*residing on distinct peers. ...##
###
Massively scalable density based clustering (DBSCAN) on the HPCC systems big data platform

2021
*
IAES International Journal of Artificial Intelligence (IJ-AI)
*

The algorithm seeks to fully parallelize the algorithm implementation by making use

doi:10.11591/ijai.v10.i1.pp207-214
fatcat:pwk5mmh3i5ac5gdgmzsbdol3da
*of*HPCC Systems*optimal**distributed*architecture and performing a tree-based*union*to merge local clusters. ... The proposed approach* was tested both on synthetic as well as standard datasets (MFCCs*Data**Set*) and found to be completely accurate. ... College*of*Engineering for actively sup-porting this research work. The interfacing documentation and the*set**of*clusters used to test the system were provided by Lexis-Nexis. R.V. ...##
###
Fast algorithms for hierarchical range histogram construction

2002
*
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '02
*

W e complement our analytical contributions with an experimental evaluation using real

doi:10.1145/543613.543637
dblp:conf/pods/GuhaKS02
fatcat:3wace2gn4bh4tcnogue3jcwcym
*data**sets*, demonstrating the superiority*of*our approach. ... Accurately estimating the*distribution**of*measure attributes, under hierarchical selections, is important i n a v ariety o f scenarios, including approximate query evaluation and cost-based*optimization*... in 0; n can be expressed as a*disjoint**union**of*two i n tervals from the sparse*set*. ...##
###
Logical Aspects of Massively Parallel and Distributed Systems

2016
*
Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '16
*

Database research has witnessed a renewed interest for

doi:10.1145/2902251.2902307
dblp:conf/pods/Neven16
fatcat:n2ikormeufe7bmy3g5gkbrfcsi
*data*processing in*distributed*and parallel*settings*. ... The second part*of*the paper addresses a*distributed*asynchronous*setting*where eventual consistency comes into play. ... I thank Tom Ameloot, Gaetano Geck, Bas Ketsman, Thomas Schwentick, Dan Suciu, and Tony Tan for helpful comments on a previous version*of*this paper. ...##
###
Supporting OLAP operations over imperfectly integrated taxonomies

2008
*
Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08
*

Experiments over synthetic and real-life

doi:10.1145/1376616.1376703
dblp:conf/sigmod/QiCTCL08
fatcat:wdqbsjis2ndl7auqnjvmuaesde
*data*verified the effectiveness and efficiency*of*our approach. ... In this paper, we first note that when multiple heterogeneous*data*sources are involved in the gathering*of*the*data*and the associated domain knowledge, the integrated knowledge-base, constructed by combining ... from a correction*of*a*data**union*as follows: Thus, the goal*of*the concept graph to taxonomy conversion process is to find an*optimal*correction*of*the given*data**set*: In the rest*of*the section, we ...##
###
Distributed Estimation Using Non-regular Quantized Data

2017
*
Journal of information and communication convergence engineering
*

We consider a

doi:10.6109/jicce.2017.15.1.7
dblp:journals/jicce/Kim17
fatcat:liv7fivbwnaixpsmwgrsdgslqu
*distributed*estimation where many nodes remotely placed at known locations collect the measurements*of*the parameter*of*interest, quantize these measurements, and transmit the quantized*data*... Otherwise, tremendous complexity will be inevitable due to multiple codewords or partitions interpreted from*non*-regular quantized*data*. ... This partition can be defined as a single j-th codeword, a*set**of*multiple codewords, or a*union**of**disjoint*partitions, depending on the quantization algorithms used at the node. ...##
###
Data-Parallel Mesh Connected Components Labeling and Analysis
[article]

2011
*
Eurographics Symposium on Parallel Graphics and Visualization
*

The identification task is challenging in a

doi:10.2312/egpgv/egpgv11/131-140
fatcat:67lbtrvqorgd5gncnrowpkylqu
*distributed*-memory parallel*setting*because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. ... Our algorithm employs a multi-stage application*of*the*Union*-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling*of*connected ... Acknowledgments This work performed under the auspices*of*the U.S. Department*of*Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. ...##
###
Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms
[article]

2018
*
arXiv
*
pre-print

For any fixed dimension, our algorithm has

arXiv:1802.08513v1
fatcat:657zih37lrdmzjksdd2h2vdwna
*optimal*sample complexity, up to logarithmic factors, and runs in near-linear time. ... We study the problem*of*robustly learning multi-dimensional histograms. ... Clearly Z is the*disjoint**union**of*Z 1 = Z ∩ ∪ R∈F R and Z 2 = Z ∩ ∪ R∈J . ...##
###
Modelling topological spatial relations: Strategies for query processing

1994
*
Computers & graphics
*

Estimates

doi:10.1016/0097-8493(94)90007-8
fatcat:muxxtvcaebdkvkmcqiozookgxi
*of*the*distribution*are important, because the application often determines what relations are feasible. ... This paper focuses on the processing and algebraic*optimization**of*spatial queries with topological constraints. ... This section presents an algorithm to build a near*optimal*decision tree that considers the expected frequency*of*occurrence*of*relations for a particular*data**set*. ...##
###
Minimum energy disjoint path routing in wireless ad-hoc networks

2003
*
Proceedings of the 9th annual international conference on Mobile computing and networking - MobiCom '03
*

Finally, we discuss issues regarding

doi:10.1145/938998.938999
fatcat:ybifnh5fefbz5pawmexqccbxky
*distributed*implementation and present*distributed*versions*of*the*optimal*centralized algorithms presented in the paper. ... Our major results include a novel polynomial time algorithm that*optimally*solves the minimum energy 2 link-*disjoint*paths problem, as well as a polynomial time algorithm for the minimum energy k node-*disjoint*... Figure 5 : 5 Example*of*a pair*of*link-*disjoint*paths expressed as the*union**of*a*set**of*node-*disjoint*path pairs. ordered*set**of*common nodes, C(P ) = S-D path in Transformed Graph (i.e. ...##
###
###
A Jensen-Shannon Kernel for Hypergraphs
[chapter]

2012
*
Lecture Notes in Computer Science
*

The Shannon entropies required to construct the Jensen-Shannon divergence for pairs

doi:10.1007/978-3-642-34166-3_20
fatcat:bklgjqsv4zdibm4w2n7nexnxje
*of*hypergraphs are obtained from steady state probability*distributions**of*the random walk. ... We commence by calculating probability*distribution*over the steady state random walk on a hypergraph. ... Definitions and Notations Hypergraph Fundamentals A hypergraph is a generalization*of*a undirected graph, it is usually denoted by a pair*set*G(V, E) where V is a*set**of*vertices and E is a*set**of**non*-empty ...##
###
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

2012
*
2012 International Conference for High Performance Computing, Networking, Storage and Analysis
*

More specifically, we employ the

doi:10.1109/sc.2012.9
dblp:conf/sc/PatwaryPALMC12
fatcat:hse2oi3m5fcp5a4yaxkzkozmfy
*disjoint*-*set**data*structure to break the access sequentiality*of*DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. ... However, parallelization*of*DBSCAN is challenging as it exhibits an inherent sequential*data*access order. ... The*Disjoint*-*Set**Data*Structure The*disjoint*-*set**data*structure defines a mechanism to maintain a dynamic collection*of**non*-overlapping*sets*[21] , [27] . ...##
###
Network Coded Gossip with Correlated Data
[article]

2012
*
arXiv
*
pre-print

We give a clean framework for oblivious network models that applies to a multitude

arXiv:1202.1801v1
fatcat:jo2gsxhyvvghznnkmdse4zol7y
*of*network and communication scenarios, specify a general*setting*for*distributed*correlated*data*, and give tight bounds ... The main figure*of*merit in this*setting*is the stopping time -- the time required until nodes can successfully decode. ... + log k rounds are sufficient to get this for*non*-*disjoint*paths. ...##
###
Fast Compaction Algorithms for NoSQL Databases

2015
*
2015 IEEE 35th International Conference on Distributed Computing Systems
*

We then propose a

doi:10.1109/icdcs.2015.53
dblp:conf/icdcs/GhoshGGK15
fatcat:xkonhco3ozebrhsv4kpg35jnnq
*set**of*algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. ... In this work, we formally define compaction as an*optimization*problem that attempts to minimize disk I/O. We prove this problem to be NP-Hard. ... A ⌫ is*union**of*some initial*sets*, and A 0 ⌫ is the*union**of*corresponding modified initial*sets*which are*disjoint*. It follows that |A ⌫ | |A 0 ⌫ |. Summing we get, Cost Cost 0 = OPT 0 . ...
