Filters








13,687 Hits in 3.8 sec

Balance-aware distributed string similarity-based query processing system

Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, Zhifeng Bao
2019 Proceedings of the VLDB Endowment  
In this paper, we develop a distributed in-memory similarity-based query processing system called Dima.  ...  To the best of our knowledge, this is the first full-fledged distributed in-memory system that can support complex similarity-based query processing on largescale datasets.  ...  String similarity-based query processing is indispensable in data integration systems (e.g. the state-of-the-art data integration system relies on string similarity to find candidate pairs from datasets  ... 
doi:10.14778/3329772.3329774 fatcat:sorbhypaijg6rjoevbvdhxyjbm

Similarity Queries on Structured Data in Structured Overlays

M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt
2006 22nd International Conference on Data Engineering Workshops (ICDEW'06)  
Structured P2P systems based on distributed hash tables are a popular choice for building large-scaled data management systems.  ...  In this work, we suggest a vertical data organization, which allows for efficient processing of similarity queries on instance as well as on schema level, and we introduce corresponding physical similarity  ...  Beside P2P-based IR systems and peer data management systems (PDMS) as an extension of federated database systems, structured P2P systems based on distributed hash tables (DHT) are a very promising approach  ... 
doi:10.1109/icdew.2006.137 dblp:conf/icde/KarnstedtSHS06 fatcat:7vxqyiqgrrfwpd2vio5ymo74z4

Scalable Indexing and Adaptive Querying of RDF Data in the cloud

Nikolaos Papailiou, Dimitrios Tsoumakos, Ioannis Konstantinou, Panagiotis Karras, Nectarios Koziris
2014 Proceedings of Semantic Web Information Management on Semantic Web Information Management - SWIM'14  
H2RDF+ can also adaptively process both complex and selective queries by adaptively choosing the amount of resources allocated for each join, based on join complexity estimated through index statistics  ...  Yet, these systems prove highly inflexible in adjusting their behavior relative to the query in hand. Queries over triple data include multiple joins with varying degrees of selectivity and cost.  ...  In addition, the system is not aware of query pattern selectivity and thus relies only on joins that process large amounts of data even for selective queries.  ... 
doi:10.1145/2630602.2630603 dblp:conf/sigmod/PapailiouTKKK14a fatcat:7asq5xr43fbazklagzowxyynue

PIRD: P2P-Based Intelligent Resource Discovery in Internet-Based Distributed Systems

Haiying Shen, Ze Li, Ting Li, Yingwu Zhu
2008 2008 The 28th International Conference on Distributed Computing Systems  
Internet-based distributed systems enable globallyscattered resources to be collectively pooled and used in a cooperative manner to achieve unprecedented petascale supercomputing capabilities.  ...  Furthermore, few approaches are able to locate resources geographically close to the requesters, which is critical to system performance.  ...  Figure 7 .Figure 8 . 78 NumProximity-aware performance. Internet-based Distributed Systems (Grids, P2P, etc.)  ... 
doi:10.1109/icdcs.2008.9 dblp:conf/icdcs/ShenLLZ08 fatcat:cy372qoqjzelpffp36udc7ybiy

ICDE conference 2015 detailed author index

2015 2015 IEEE 31st International Conference on Data Engineering  
of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines Shah, Shetal 1468 The XDa-TA System for Automated Grading of SQL Query Assignments Shahabi, Cyrus 1404 PrivGeoCrowd:  ...  471 Bi-Temporal Timeline Index: A Data Structure for Processing Queries on Bi-Temporal Data Kourtellis, Nicolas 137 The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing  ... 
doi:10.1109/icde.2015.7113260 fatcat:ep7pomkm55f45j33tkpoc5asim

Data-parallel query processing on non-uniform data

Henning Funke, Jens Teubner
2020 Proceedings of the VLDB Endowment  
We observe shorter execution times for TPC-H benchmark queries by factors up to 4.51x compared with existing GPU query compilers and by factors up to 4.54x compared with CPU-based systems.  ...  By balancing divergence effects, our approach is able to restore processing efficiency even when pipelines contain heavily skewed operations.  ...  ACKNOWLEDGEMENTS We would like to thank Florian Lüdiger for the experimental work on string pattern matching and the anonymous reviewers for their helpful comments and suggestions.  ... 
doi:10.14778/3380750.3380758 fatcat:mwkhe6nzp5epnparlimxo5tg2a

On the Expressiveness and Trade-Offs of Large Scale Tuple Stores [chapter]

Ricardo Vilaça, Francisco Cruz, Rui Oliveira
2010 Lecture Notes in Computer Science  
Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements  ...  Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing.  ...  Dynamo assembles several distributed systems concepts (data partitioning and replication, Merkle trees, load balancing, etc.) in a production system.  ... 
doi:10.1007/978-3-642-16949-6_5 fatcat:ppzhaemf7bblpngo32omsa54te

Information retrieval in schema-based P2P systems using one-dimensional semantic space

Tao Gu, Hung Keng Pung, Daqing Zhang
2007 Computer Networks  
In this paper, we present Dynamic Semantic Space, a schema-based peer-to-peer overlay network that facilitates efficient lookup for RDF-based information in dynamic environments.  ...  The widespread use of RDF-based information necessitates efficient information retrieval techniques in wide-area networks.  ...  However, data placement in these systems is tightly controlled based on distributed hash functions.  ... 
doi:10.1016/j.comnet.2007.06.019 fatcat:6ehlnhpz5bbo3ggsbpr6bofz4e

Cube-Based Analysis for Maintaining XML Data Partition for Holistic Twig Joins

Imam MACHDI, Toshiyuki AMAGASA, Hiroyuki KITAGAWA
2008 Information and Media Technologies  
Distributing XML documents to a cluster system has raised a problematic issue that leads to combinatorial optimization solution in order to achieve good workload balance and good performance of query processing  ...  Inspired by our previous works [3, 11] , we also adopt a cost-based approach for measuring a query processing cost. Every partition is associated with an accumulated cost of query processing.  ... 
doi:10.11185/imt.3.552 fatcat:njtwpvobtrc3xdqomcouhznmla

Scalable distributed indexing and query processing over Linked Data

Marcel Karnstedt, Kai-Uwe Sattler, Manfred Hauswirth
2012 Journal of Web Semantics  
A number of e cient local RDF stores exist already, while distributed indexing and distributed query processing over Linked Data with similar e ciency and data management features as known from traditional  ...  Our system is based on a layered architecture that makes use of the advantages of decentralized indexing and query processing approaches, which have been researched and matured over the last decade.  ...  Query processing is supported by optimized, cost-based, distributed query execution strategies and fully utilizes the inherent parallelism of distributed systems.  ... 
doi:10.1016/j.websem.2011.11.010 fatcat:4fpikppnz5gbrk23rtcf5robuy

Scalable Distributed Indexing and Query Processing Over Linked Data

Marcel Karnstedt, Kai-Uwe Sattler, Manfred Hauswirth
2012 Social Science Research Network  
A number of efficient local RDF stores exist already, while distributed indexing and distributed query processing over Linked Data with similar efficiency and data management features as known from traditional  ...  Our system is based on a layered architecture that makes use of the advantages of decentralized indexing and query processing approaches, which have been researched and matured over the last decade.  ...  Query processing is supported by optimized, cost-based, distributed query execution strategies and fully utilizes the inherent parallelism of distributed systems.  ... 
doi:10.2139/ssrn.3198930 fatcat:k6ilnvk2fffzdgc2mhpryahoju

A computational theory of awareness and decision making

Nikhil R. Devanur, Lance Fortnow
2009 Proceedings of the 11th Conference on Theoretical Aspects of Rationality and Knowledge - TARK '09  
We also give a formal process-independent definition of awareness based on Levin's universal enumeration.  ...  We exhibit a new computational-based definition of awareness, informally that our level of unawareness of an object is the amount of time needed to generate that object within a certain environment.  ...  Their paper [CF08] and subsequent discussions exhibited the need for a new computational-based definition of awareness that eventually led to the model described here.  ... 
doi:10.1145/1562814.1562830 dblp:conf/tark/DevanurF09 fatcat:zjdehbvnwrecpplx32cht5h5ca

Scalable Interactive Middleware Components for Ubiquitous Fashionable Computers [chapter]

Gyudong Shim, Kyu Ho Park
2009 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering  
We optimized the query processing by efficient node traversing and data-aware interval skipping. The tuple matching process is performed in bounded time up to 100,000 objects.  ...  In addition the system handles events caused by user commands. We developed efficient tuple indexing and query mechanism by composite keys.  ...  Fan search is optimized in tree traversing by path stack and data-aware interval skipping. The paper focuses only on the indexing schemes and query processing.  ... 
doi:10.1007/978-3-642-01802-2_11 fatcat:ilkik27h6re7dcbbqomyqbbaby

Scalable Querying of Nested Data [article]

Jaclyn Smith, Michael Benedikt, Milos Nikolic, Amir Shaikhha
2020 arXiv   pre-print
While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections.  ...  These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions.  ...  Beyond the application to Spark, these results should be useful for further implementations of automated nested query processing on distributed systems.  ... 
arXiv:2011.06381v1 fatcat:7fulntcavrdgrl2zmmmyiqc52q

A Correlation-Aware Data Placement Strategy for Key-Value Stores [chapter]

Ricardo Vilaça, Rui Oliveira, José Pereira
2011 Lecture Notes in Computer Science  
Moreover, existing key-value stores have only random or order based placement strategies.  ...  Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency.  ...  Our novel data placement strategy that allows to dynamically correlate items is based on Space Filing Curves(SFCs). SFCs had been used to process multidimensional queries in P2P systems.  ... 
doi:10.1007/978-3-642-21387-8_17 fatcat:fxxij34iz5ampcnwfmw3zum7vi
« Previous Showing results 1 — 15 out of 13,687 results