19,087 Hits in 5.5 sec

Time and space optimization for processing groups of multi-dimensional scientific queries

Suresh Aryangat, Henrique Andrade, Alan Sussman
2004 Proceedings of the 18th annual international conference on Supercomputing - ICS '04  
The last two steps effectively reduce both the time and space to execute query groups, as shown in the experimental results.  ...  This paper addresses the optimizations performed by a high performance database system that processes groups of data analysis requests for these applications, which we call queries.  ...  A RAG query typically has both spatial and temporal predicates, namely a multi-dimensional bounding box in the underlying multi-dimensional attribute space of the dataset.  ... 
doi:10.1145/1006209.1006224 dblp:conf/ics/AryangatAS04 fatcat:2jduaq4ntrcg5jmzrisimklfy4

A Black-Box Approach to Query Cardinality Estimation

Tanu Malik, Randal C. Burns, Nitesh V. Chawla
2007 Conference on Innovative Data Systems Research  
It does so by grouping queries into syntactic families and learning the cardinality distribution of that group directly from points in a high-dimensional input space constructed from the query's attributes  ...  We envision an increasing need for such an approach in applications in which query cardinality is required for resource optimization and decision-making at locations that are remote from the data sources  ...  We also thank Xiaodan Wang for his help the experimental section and explaining the gory details of the Open SkyQuery source code.  ... 
dblp:conf/cidr/MalikBC07 fatcat:fascpj2j3jatjo4jnm4dsa2ija

Exploiting Massive Parallelism for IndexingMulti-Dimensional Datasets on the GPU

Jinwoong Kim, Won-Ki Jeong, Beomseok Nam
2015 IEEE Transactions on Parallel and Distributed Systems  
Index Terms-Parallel multi-dimensional indexing; Multi-dimensional range query; GPGPU; ! • The authors are with the School of Electrical and Computer Engineering,  ...  Inherently multi-dimensional n-ary indexing structures such as R-trees are not well suited for the GPU because of their irregular memory access patterns and recursive back-tracking function calls.  ...  R-Tree [9] and its variants are most commonly used for multi-dimensional range query processing as of today.  ... 
doi:10.1109/tpds.2014.2347041 fatcat:lhmnon4mtzfz3hlxnygpkpgxnm

Enabling scientific data storage and processing on big-data systems

Saman Biookaghazadeh, Yiqi Xu, Shujia Zhou, Ming Zhao
2015 2015 IEEE International Conference on Big Data (Big Data)  
This paper presents a solution to this problem by enabling big-data systems to directly store and process scientific data.  ...  The results show that the proposed approach achieves substantial speedup (up to 20 times) and space saving (83% reduction), compared to the traditional approach which has to convert NetCDF data to CSV  ...  This research is sponsored by the National Science Foundation CAREER award CNS-1253944, the Department of Defense award W911NF-13-1-0157, and a gift from VMware Inc. S.  ... 
doi:10.1109/bigdata.2015.7363978 dblp:conf/bigdataconf/BiookaghazadehX15 fatcat:cgp433manne55daz6n4ll2jqdi

ArrayBridge: Interweaving declarative array processing with high-performance computing [article]

Haoyuan Xing, Suren Byna The Ohio State University
2017 arXiv   pre-print
Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries.  ...  ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency.  ...  The evaluation used resources of the National Energy Research Scientific Computing Center (NERSC).  ... 
arXiv:1702.08327v1 fatcat:mj3taabp5vcgdlcficikjno6li

A Survey on Array Storage, Query Languages, and Systems [article]

Florin Rusu, Yu Cheng
2013 arXiv   pre-print
The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals.  ...  Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data.  ...  across multi-dimensional spaces when the number of disks is large.  ... 
arXiv:1302.0103v2 fatcat:hopvki5yfjcbhek46n7ye7f2be

Accelerating Queries on Very Large Datasets [chapter]

Ekow Otoo, Kesheng Wu
2009 Scientific Data Management  
In this chapter, we explore ways to answer queries on large multi-dimensional data efficiently. Given a large dataset, a user often wants to access only a relatively small number of the records.  ...  Among the known indexing methods, bitmap indexes are particularly well suited for answering such queries on large scientific data.  ...  They are relatively compact compared to common implementations of B-Trees, and they scale well for high-dimensional data and multi-dimensional queries.  ... 
doi:10.1201/9781420069815-c6 fatcat:qa5gbbtrgjd7lchyzeylmwaqee

Optimizing multiple queries on scientific datasets with partial replicas

Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
2007 2007 8th IEEE/ACM International Conference on Grid Computing  
If we think of the attributes of a dataset forming multi-dimensional space, where each attribute corresponds to one of the dimensions, a range query defines a bounding box in this multi-dimensional space  ...  Our results using queries for subsetting and analysis of medical image datasets show that effective use of partial replicas can result in reduction in query execution times. *  ...  query, and process multi-dimensional datasets.  ... 
doi:10.1109/grid.2007.4354141 dblp:conf/grid/WengCKAS07 fatcat:fd42agtjhvcypmiyaplo7aieqa

Spatially clustered join on heterogeneous scientific data sets

Bin Dong, Surendra Byna, Kesheng Wu
2015 2015 IEEE International Conference on Big Data (Big Data)  
Together, these techniques allow scientific data files to be used for query processing with less I/O cost and fast query response time without the extra cost to perform file format conversion and data  ...  known as Multi-Dimensional Binning (MDBin), and a join processing algorithm known as Spatially Clustered Join (SCJoin).  ...  : process rank in MPI group s : MPI process group size 1.  ... 
doi:10.1109/bigdata.2015.7363778 dblp:conf/bigdataconf/DongBW15 fatcat:v5xdukg2mfb4hkafsn5winjnq4


J. Gong, H. Wu, P. Yue, X. Zhu, W. Gao
2012 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Web-based multi-dimensional space-time data analysis Space-time data analysis models and methods need to be developed.  ...  The concept of Web GeoIntelligence The accomplishment of Web GeoIntelligence will build a new generation of multi-level, multi-granularity, multi-dimensional space-time data management and visualization  ... 
doi:10.5194/isprsarchives-xxxviii-4-w25-19-2011 fatcat:gzksevb2ane2fc2o6o5g7vbzie

Estimating Query Result Sizes for Proxy Caching in Scientific Database Federations

Tanu Malik, Randal Burns, Nitesh Chawla, Alex Szalay
2006 ACM/IEEE SC 2006 Conference (SC'06)  
In a proxy cache for federations of scientific databases it is important to estimate the size of a query before making a caching decision.  ...  We present classification and regression over templates (CAROT), a general method for estimating query result sizes, which is suited to the resource-limited environment of proxy caches and the distributed  ...  For yield estimation, an algorithm learns results sizes based on the multi-dimensional parameter space of the template variables.  ... 
doi:10.1109/sc.2006.27 fatcat:yzq67pfs7za3ndj736tqdljapy

A Multi-Dimensional Data Storage Using Quad-Tree and Z-Ordering

Fang Hou, Cheng Hui Huang, Ji Yuan Lu
2013 Applied Mechanics and Materials  
Multi-dimensional applications use tree structure to store data and space filling curves to traverse data. Most frequently used Quad-tree and Z-ordering curve are analyzed.  ...  By importing these to a HDF5 file format, a multi-dimensional data storage subsystem is constructed.  ...  INTRODUCTION A multi-dimensional data processing procedure includes data generation, data (In-Situ) [1] compression and decompression, data storage, data query, data transport, data simulation, etc.  ... 
doi:10.4028/ fatcat:kwtbnzmz2zgf3mkqqxsoodzub4

Squid: Enabling search in DHT-based systems

Cristina Schmidt, Manish Parashar
2008 Journal of Parallel and Distributed Computing  
The fundamental concept underlying the approach is the definition of multi-dimensional information spaces and the maintenance of locality in these spaces.  ...  The key innovation is a dimensionality reducing indexing scheme that effectively maps the multi-dimensional information space to physical peers while preserving lexical locality.  ...  Acknowledgments This work is supported in part by the National Science Foundation through grants numbers ACI 9984357, EIA 0103674, EIA 0120934, ANI 0335244, CNS 0305495, CNS 0426354 and IIS 0430826.  ... 
doi:10.1016/j.jpdc.2008.02.003 fatcat:av2rsam2uvgg3gv7bwghtpmyf4

Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Buğra Gedik
2013 International journal of parallel programming  
multi-NN search algorithms: linear scan and tree traversal, and (2) an offline auto-tuner for setting these knobs by iteratively measuring actual query execution times for a given workload and dataset  ...  Another common query is the retrieval of multiple sets of nearest neighbors, i.e., multi k-NN, for different query items on the same data.  ...  in high-dimensional spaces).  ... 
doi:10.1007/s10766-013-0239-8 fatcat:2gpdx6e62zfazl2eeykobwsrhe

Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications

Xi Zhang, T. Kurc, J. Saltz, S. Parthasarathy
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
This new implementation is then bootstrapped on top of our previous implementation, which efficiently samples large datasets along a single dimension (e.g., time), thereby realizing a service for spatio-temporal  ...  In this paper we present a scalable sampling implementation that supports efficient, multi-dimensional spatio-temporal sample generation on dynamic, large scale datasets stored on a storage cluster.  ...  This index structure facilitates fast retrieval of multi-dimensional samples that encompass constraints over time and space.  ... 
doi:10.1109/ipdps.2006.1639315 dblp:conf/ipps/ZhangKSP06 fatcat:emeykbxpijdtxlqbwdfsid36s4
« Previous Showing results 1 — 15 out of 19,087 results