Filters








12 Hits in 1.7 sec

GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Alexander S. Szalay, Gordon Bell, Jan vandenBerg, Alainna Wonders, Randal C. Burns, Dan Fay, Jim Heasley, Tony Hey, María A. Nieto-Santisteban, Ani Thakar, Catharine van Ingen, Richard Wilton
2009 2009 42nd Hawaii International Conference on System Sciences  
Data intensive computing presents novel challenges for traditional computing architectures that have focused on FLOPS.  ...  We present the architecture of a database cluster targeted at dataintensive computations with petascale data sets.  ...  Financial support for the GrayWulf cluster hardware was provided by the Gordon and Betty Moore Foundation, Microsoft Research and the Pan-STARRS project.  ... 
doi:10.1109/hicss.2009.234 dblp:conf/hicss/SzalayBvWBFHHNTIW09 fatcat:uff3uxfsfvhblmdj6jbqtyeudq

An overview of the Open Science Data Cloud

Robert L. Grossman, Yunhong Gu, Joe Mambretti, Michal Sabala, Alex Szalay, Kevin White
2010 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10  
The Open Science Data Cloud is a distributed cloud based infrastructure for managing, analyzing, archiving and sharing scientific datasets.  ...  We introduce the Open Science Data Cloud, give an overview of its architecture, provide an update on its current status, and briefly describe some research areas of relevance.  ...  The Open Cloud Consortium is managed by the Center for Computational Science Research, Inc., which is a 501(c)(3) not-for-profit corporation.  ... 
doi:10.1145/1851476.1851533 dblp:conf/hpdc/GrossmanGMSSW10 fatcat:ozq3i6bqmjcelmfinxvdxqgoqy

Large Science Databases – Are Cloud Services Ready for Them?

Ani Thakar, Alex Szalay, Ken Church, Andreas Terzis
2011 Scientific Programming  
We describe a powerful new computational instrument that we are developing in the interim – the Data-Scope – that will enable fast and efficient analysis of the largest (petabyte scale) scientific datasets  ...  needs to occur between cloud service providers and their potential clients before science databases – not just large ones but even smaller databases that make extensive use of advanced database features for  ...  Also thanks to Roger Barga (Microsoft Research) for his help with migrating data to SQL Azure, and pointing us to the SQL Azure Migration Wizard.  ... 
doi:10.1155/2011/591536 fatcat:ruu3xrke7jcypfhya4ljczqp6y

Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz
2013 Proceedings of the VLDB Endowment  
, and emerging scientific applications that are increasingly data- and compute-intensive.  ...  Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries.  ...  IBM provides academic license for DB2. David Adler and Susan Malaika from IBM provided many insightful suggestions.  ... 
pmid:24187650 pmcid:PMC3814183 fatcat:v5ov5f6vhzcklbkoyztq7zovda

Hadoop GIS

Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz
2013 Proceedings of the VLDB Endowment  
, and emerging scientific applications that are increasingly data-and compute-intensive.  ...  Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries.  ...  A major requirement for the data intensive spatial applications is fast query response which requires a scalable architecture that can query spatial data on a large scale.  ... 
doi:10.14778/2536222.2536227 fatcat:z7w7hmd23na4flngnhyfvapbe4

Micro-level Modularity of Computaion-intensive Programs in Big Data Platforms: A Case Study with Image Data [article]

Amit Kumar Mondal, Banani Roy, Chanchal K. Roy, Kevin A. Schneider
2019 arXiv   pre-print
To that end, we synthesize image data-processing patterns and propose a unified modular model for the effective implementation of computation-intensive tasks on data-parallel frameworks considering reproducibility  ...  One approach to better support interactivity and reusability is the use of microlevel modularisation for computation-intensive tasks, which splits data operations into independent, composable modules.  ...  How to modularize data and computation-intensive programs to provide a unified abstract framework for developing interactive tools? RQ2.  ... 
arXiv:1910.11125v1 fatcat:ipnhfncvcreadpudovktvxh6ue

Data-intensive spatial filtering in large numerical simulation datasets

Kalin Kanov, Randal Burns, Greg Eyink, Charles Meneveau, Alexander Szalay
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
We present a query processing framework for the efficient evaluation of spatial filters on large numerical simulation datasets stored in a data-intensive cluster.  ...  We present two complementary methods of execution: I/O streaming computes a batch filter query in a single sequential pass using incremental evaluation of decomposable kernels, summed volumes generates  ...  Acknowledgment The authors would like to thank the Turbulence Database Group at Johns Hopkins University for their insightful comments and suggestions, as well as providing us with potential usage patterns  ... 
doi:10.1109/sc.2012.41 dblp:conf/sc/KanovBEMS12 fatcat:hxralquvdrccjlpt74q6bodlpy

I/O streaming evaluation of batch queries for data-intensive computational turbulence

Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, Alexander Szalay
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
for our scientists' data-intensive workloads.  ...  We describe a method for evaluating computational turbulence queries, including Lagrange Polynomial interpolation, based on partial sums that allows the underlying data to be accessed in any order and  ...  We would also like to acknowledge support from the Institute for Data Intensive Engineering and Science at Johns Hopkins University, of which Randal Burns and Alexander Szalay are members.  ... 
doi:10.1145/2063384.2063423 dblp:conf/sc/KanovPBAS11 fatcat:lo7pqmg6vzbjfg4tqvlr5pa7vq

Report from the 2nd Workshop on Extremely Large Databases

Jacek Becla, Kian-Tat Lim
2008 Data Science Journal  
Analysts are struggling to use complex techniques such as time series analysis and classification algorithms because their familiar, powerful tools are not scalable and cannot effectively use scalable  ...  COMPLEX ANALYTICS -PROCESSING Architecture For the largest-scale datasets, there is no debate that computation must be moved close to where the data resides, rather than moving the data to the computation  ...  SciDB will run on incrementally scalable clusters or clouds of commodity hardware. Optionally, it will operate on "in situ" data without a formal database loading process.  ... 
doi:10.2481/dsj.7.196 fatcat:oefd2umisfe6lhgjxeri4wsarm

LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases [article]

Xiaodan Wang , Tanu Malik
2009 arXiv   pre-print
To maximize throughput for data-intensive queries, we put forth LifeRaft: a query processing system that batches queries with overlapping data requirements.  ...  These workloads consist of "needle in a haystack" queries that are long running and data intensive so that query throughput limits performance.  ...  We also wish to thank Ani Thakar, Tamás Budavari and the rest of the Sloan Digital Sky Survey team at Johns Hopkins University for their assistance with Astronomy data and workloads.  ... 
arXiv:0909.1760v1 fatcat:dapcsv7panbsrc5cmus322jcku

DATA MINING AND MACHINE LEARNING IN ASTRONOMY

NICHOLAS M. BALL, ROBERT J. BRUNNER
2010 International Journal of Modern Physics D  
However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results.  ...  Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results.  ...  The authors made extensive use of the storage and computing facilities at the National Center for Supercomputing Applications and thank the technical staff for their assistance in enabling this work.  ... 
doi:10.1142/s0218271810017160 fatcat:qd442usdmfgalbomkkiyvwzsfu

Efficient Evaluation of HAVING Queries on a Probabilistic Database [chapter]

Christopher Ré, Dan Suciu
Database Programming Languages  
The integrated method for the evaluation of threshold queries that we have developed achieves scalability through data-parallel execution of the computations on the nodes of an analysis database cluster  ...  Data-intensive computations that examine entire time-steps of the simulation data are impractical to perform locally by the user, taking days or months to iterate over the entire dataset.  ...  This work is supported in part by the National Science Foundation under Grants CMMI-0941530, ACI-1261715, OCI-1244820 and AST-0939767 and Johns Hopkins University's Institute for Data Intensive Engineering  ... 
doi:10.1007/978-3-540-75987-4_13 dblp:conf/dbpl/ReS07 fatcat:k5uba4wocjfrhettqn3kccewoe