Filters








32,083 Hits in 5.8 sec

Optimized union of non-disjoint distributed data sets

Itay Dar, Tova Milo, Elad Verbin
2009 Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09  
In a variety of applications, ranging from data integration to distributed query evaluation, there is a need to obtain sets of data items from several sources (peers) and compute their union.  ...  A challenge in the design of optimal plans is the lack of a complete map of the distribution of the data items among peers.  ...  of non-disjoint data sets residing on distinct peers.  ... 
doi:10.1145/1516360.1516364 dblp:conf/edbt/DarMV09 fatcat:d2nzkx7rzzcsloeui3bh5qaoou

Massively scalable density based clustering (DBSCAN) on the HPCC systems big data platform

Yatish H. R., Shubham Milind Phal, Tanmay Sanjay Hukkeri, Lili Xu, Shobha G, Jyoti Shetty, Arjuna Chala
2021 IAES International Journal of Artificial Intelligence (IJ-AI)  
The algorithm seeks to fully parallelize the algorithm implementation by making use of HPCC Systems optimal distributed architecture and performing a tree-based union to merge local clusters.  ...  The proposed approach* was tested both on synthetic as well as standard datasets (MFCCs Data Set) and found to be completely accurate.  ...  College of Engineering for actively sup-porting this research work. The interfacing documentation and the set of clusters used to test the system were provided by Lexis-Nexis. R.V.  ... 
doi:10.11591/ijai.v10.i1.pp207-214 fatcat:pwk5mmh3i5ac5gdgmzsbdol3da

Fast algorithms for hierarchical range histogram construction

Sudipto Guha, Nick Koudas, Divesh Srivastava
2002 Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '02  
W e complement our analytical contributions with an experimental evaluation using real data sets, demonstrating the superiority of our approach.  ...  Accurately estimating the distribution of measure attributes, under hierarchical selections, is important i n a v ariety o f scenarios, including approximate query evaluation and cost-based optimization  ...  in 0; n can be expressed as a disjoint union of two i n tervals from the sparse set.  ... 
doi:10.1145/543613.543637 dblp:conf/pods/GuhaKS02 fatcat:3wace2gn4bh4tcnogue3jcwcym

Logical Aspects of Massively Parallel and Distributed Systems

Frank Neven
2016 Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '16  
Database research has witnessed a renewed interest for data processing in distributed and parallel settings.  ...  The second part of the paper addresses a distributed asynchronous setting where eventual consistency comes into play.  ...  I thank Tom Ameloot, Gaetano Geck, Bas Ketsman, Thomas Schwentick, Dan Suciu, and Tony Tan for helpful comments on a previous version of this paper.  ... 
doi:10.1145/2902251.2902307 dblp:conf/pods/Neven16 fatcat:n2ikormeufe7bmy3g5gkbrfcsi

Supporting OLAP operations over imperfectly integrated taxonomies

Yan Qi, K. Selçuk Candan, Junichi Tatemura, Songting Chen, Fenglin Liao
2008 Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08  
Experiments over synthetic and real-life data verified the effectiveness and efficiency of our approach.  ...  In this paper, we first note that when multiple heterogeneous data sources are involved in the gathering of the data and the associated domain knowledge, the integrated knowledge-base, constructed by combining  ...  from a correction of a data union as follows: Thus, the goal of the concept graph to taxonomy conversion process is to find an optimal correction of the given data set: In the rest of the section, we  ... 
doi:10.1145/1376616.1376703 dblp:conf/sigmod/QiCTCL08 fatcat:wdqbsjis2ndl7auqnjvmuaesde

Distributed Estimation Using Non-regular Quantized Data

Yoon Hak Kim
2017 Journal of information and communication convergence engineering  
We consider a distributed estimation where many nodes remotely placed at known locations collect the measurements of the parameter of interest, quantize these measurements, and transmit the quantized data  ...  Otherwise, tremendous complexity will be inevitable due to multiple codewords or partitions interpreted from non-regular quantized data.  ...  This partition can be defined as a single j-th codeword, a set of multiple codewords, or a union of disjoint partitions, depending on the quantization algorithms used at the node.  ... 
doi:10.6109/jicce.2017.15.1.7 dblp:journals/jicce/Kim17 fatcat:liv7fivbwnaixpsmwgrsdgslqu

Data-Parallel Mesh Connected Components Labeling and Analysis [article]

Cyrus Harrison, Hank Childs, Kelly P. Gaither
2011 Eurographics Symposium on Parallel Graphics and Visualization  
The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors.  ...  Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected  ...  Acknowledgments This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.  ... 
doi:10.2312/egpgv/egpgv11/131-140 fatcat:67lbtrvqorgd5gncnrowpkylqu

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms [article]

Ilias Diakonikolas and Jerry Li and Ludwig Schmidt
2018 arXiv   pre-print
For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time.  ...  We study the problem of robustly learning multi-dimensional histograms.  ...  Clearly Z is the disjoint union of Z 1 = Z ∩ ∪ R∈F R and Z 2 = Z ∩ ∪ R∈J .  ... 
arXiv:1802.08513v1 fatcat:657zih37lrdmzjksdd2h2vdwna

Modelling topological spatial relations: Strategies for query processing

Eliseo Clementini, Jayant Sharma, Max J. Egenhofer
1994 Computers & graphics  
Estimates of the distribution are important, because the application often determines what relations are feasible.  ...  This paper focuses on the processing and algebraic optimization of spatial queries with topological constraints.  ...  This section presents an algorithm to build a near optimal decision tree that considers the expected frequency of occurrence of relations for a particular data set.  ... 
doi:10.1016/0097-8493(94)90007-8 fatcat:muxxtvcaebdkvkmcqiozookgxi

Minimum energy disjoint path routing in wireless ad-hoc networks

Anand Srinivas, Eytan Modiano
2003 Proceedings of the 9th annual international conference on Mobile computing and networking - MobiCom '03  
Finally, we discuss issues regarding distributed implementation and present distributed versions of the optimal centralized algorithms presented in the paper.  ...  Our major results include a novel polynomial time algorithm that optimally solves the minimum energy 2 link-disjoint paths problem, as well as a polynomial time algorithm for the minimum energy k node-disjoint  ...  Figure 5 : 5 Example of a pair of link-disjoint paths expressed as the union of a set of node-disjoint path pairs. ordered set of common nodes, C(P ) = S-D path in Transformed Graph (i.e.  ... 
doi:10.1145/938998.938999 fatcat:ybifnh5fefbz5pawmexqccbxky

Minimum energy disjoint path routing in wireless ad-hoc networks

Anand Srinivas, Eytan Modiano
2003 Proceedings of the 9th annual international conference on Mobile computing and networking - MobiCom '03  
Finally, we discuss issues regarding distributed implementation and present distributed versions of the optimal centralized algorithms presented in the paper.  ...  Our major results include a novel polynomial time algorithm that optimally solves the minimum energy 2 link-disjoint paths problem, as well as a polynomial time algorithm for the minimum energy k node-disjoint  ...  Figure 5 : 5 Example of a pair of link-disjoint paths expressed as the union of a set of node-disjoint path pairs. ordered set of common nodes, C(P ) = S-D path in Transformed Graph (i.e.  ... 
doi:10.1145/938985.938999 dblp:conf/mobicom/SrinivasM03 fatcat:nwuk4x3hl5edlkn7c3abf64ww4

A Jensen-Shannon Kernel for Hypergraphs [chapter]

Lu Bai, Edwin R. Hancock, Peng Ren
2012 Lecture Notes in Computer Science  
The Shannon entropies required to construct the Jensen-Shannon divergence for pairs of hypergraphs are obtained from steady state probability distributions of the random walk.  ...  We commence by calculating probability distribution over the steady state random walk on a hypergraph.  ...  Definitions and Notations Hypergraph Fundamentals A hypergraph is a generalization of a undirected graph, it is usually denoted by a pair set G(V, E) where V is a set of vertices and E is a set of non-empty  ... 
doi:10.1007/978-3-642-34166-3_20 fatcat:bklgjqsv4zdibm4w2n7nexnxje

A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters.  ...  However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order.  ...  The Disjoint-Set Data Structure The disjoint-set data structure defines a mechanism to maintain a dynamic collection of non-overlapping sets [21] , [27] .  ... 
doi:10.1109/sc.2012.9 dblp:conf/sc/PatwaryPALMC12 fatcat:hse2oi3m5fcp5a4yaxkzkozmfy

Network Coded Gossip with Correlated Data [article]

Bernhard Haeupler, Asaf Cohen, Chen Avin, Muriel Médard
2012 arXiv   pre-print
We give a clean framework for oblivious network models that applies to a multitude of network and communication scenarios, specify a general setting for distributed correlated data, and give tight bounds  ...  The main figure of merit in this setting is the stopping time -- the time required until nodes can successfully decode.  ...  + log k rounds are sufficient to get this for non-disjoint paths.  ... 
arXiv:1202.1801v1 fatcat:jo2gsxhyvvghznnkmdse4zol7y

Fast Compaction Algorithms for NoSQL Databases

Mainak Ghosh, Indranil Gupta, Shalmoli Gupta, Nirman Kumar
2015 2015 IEEE 35th International Conference on Distributed Computing Systems  
We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads.  ...  In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NP-Hard.  ...  A ⌫ is union of some initial sets, and A 0 ⌫ is the union of corresponding modified initial sets which are disjoint. It follows that |A ⌫ |  |A 0 ⌫ |. Summing we get, Cost  Cost 0 = OPT 0 .  ... 
doi:10.1109/icdcs.2015.53 dblp:conf/icdcs/GhoshGGK15 fatcat:xkonhco3ozebrhsv4kpg35jnnq
« Previous Showing results 1 — 15 out of 32,083 results