Filters








60 Hits in 8.7 sec

FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics

Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Hongfeng Yu
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics.  ...  There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage.  ...  We also thank Ray Grout from National Renewable Energy Laboratory for his help on S3D application. This work was funded by Scientific Data Management Center, U.S.  ... 
doi:10.1109/ipdps.2013.46 dblp:conf/ipps/ZhengZESWDNCAKPY13 fatcat:uogj5f6yvfhbtcmvccrqjybhoe

SciSpark: Applying in-memory distributed computing to weather event detection and tracking

Rahul Palamuttam, Renato Marroquin Mogrovejo, Chris Mattmann, Brian Wilson, Kim Whitehall, Rishi Verma, Lewis McGibbney, Paul Ramirez
2015 2015 IEEE International Conference on Big Data (Big Data)  
In this paper we present SciSpark, a Big Data framework that extends Apache TM Spark for scaling scientific computations. The paper details the initial architecture and design of SciSpark.  ...  We also illustrate the usability and extensibility of SciSpark by implementing aspects of the Grab 'em Tag 'em Graph 'em (GTG) algorithm using SciSpark and its Map Reduce capabilities.  ...  ACKNOWLEDGMENT We acknowledge the AIST for the funding of this research under NASA proposal number 14-AIST-14-0034.  ... 
doi:10.1109/bigdata.2015.7363983 dblp:conf/bigdataconf/PalamuttamMMWWV15 fatcat:jgaxle3fdfccxojxzhqasf7clm

Pangeo NSF Earthcube Proposal

Ryan Abernathey, Kevin Paul, Joe Hamman, Matthew Rocklin, Chiara Lepore, Michael Tippett, Naomi Henderson, Richard Seager, Ryan May, Davide Del Vento
2017 Figshare  
The Project Description from the NSF-funded Earthcube project "Pangeo: An Open Source Big Data Climate Science Platform" (NSF award 1740648)  ...  This seamless transition to remote execution is a key ingredient for moving data analysis into the cloud.  ...  These activities are a centerpiece of our proposed work. NetCDF NetCDF is the most common storage format for large multi-dimensional data in geoscience domains.  ... 
doi:10.6084/m9.figshare.5361094.v1 fatcat:lgj5vrhhnfa45haoj7kfizfgfi

Feasibility Study of Effective Remote I/O Using a Parallel NetCDF Interface in a Long-Latency Network

Yuichi Tsujita
unpublished
Decomposition in multi-dimensional data leads to complex I/O operations in non-contiguous parallel I/O pattern with the help of a derived data type.  ...  Its parallel I/O interface named parallel netCDF (hereafter PnetCDF) provides parallel I/O operations with the help of an MPI interface.  ...  Acknowledgments The author would like to thank the staff at Center for Computational Science and e-Systems (CCSE), Japan Atomic Energy Agency (JAEA), for providing a Stampi library and giving useful information  ... 
fatcat:taf7fboddva2rlkeyquyfst4na

D6.6: Report on petascale software libraries and programming models

Giovanni Erbacci, Carlo Cavazzoni, Filippo Spiga, Iris Christadler
2009 Zenodo  
provided in terms of performance and efficiency and benefit for parallelism.  ...  This deliverable identifies and analyses the programming models and the software libraries required by petascaling applications in the PRACE implementation phase.  ...  For the purpose of our study, we have implemented new versions of parallel I/O, using MPI-I/O, HDF5 and PnetCDF.  ... 
doi:10.5281/zenodo.6546116 fatcat:ojachtibqfeorbsef6z2lpvbie

User-transparent Distributed TensorFlow [article]

Abhinav Vishnu, Joseph Manzano, Charles Siegel, Jeff Daily
2017 arXiv   pre-print
This dramatically reduces the entry barrier for using a distributed TensorFlow implementation.  ...  Deep Learning (DL) algorithms have become the de facto choice for data analysis.  ...  It uses a dataflow model by specifying operations on tensors (multi-dimensional arrays).  ... 
arXiv:1704.04560v1 fatcat:seekm66p3bbwjemrf64vihkhxy

Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at Scale

Donatello Elia, Sandro Fiore, Giovanni Aloisio
2021 IEEE Access  
Moreover, the authors would like to acknowledge Antonio Aloisio for his editing and proofreading work on this paper.  ...  ACKNOWLEDGMENT The authors kindly acknowledge PRACE for awarding access to MareNostrum4 at Barcelona Supercomputing Center (BSC), Spain, as well as the support by the BSC team.  ...  Each fragment is composed of a set of multi-dimensional binary arrays following a data store implementation based on a NoSQL approach.  ... 
doi:10.1109/access.2021.3079139 fatcat:zej3qjtcrvbgpnhe7a3ijv4f5e

Enabling high-speed asynchronous data extraction and transfer using DART

Ciprian Docan, Manish Parashar, Scott Klasky
2010 Concurrency and Computation  
The increasing application runtimes and the high cost of high performance computing resources make online data extraction and analysis a key requirement in addition to traditional data I/O and archiving  ...  offload simulation data to local service nodes and remote analysis nodes, with minimal over- *  ...  Fig. 9 plots the cumulative I/O times at the streaming server for (1) extracting the checkpoint from the simulation using the Portals interface, and (2) to transport the data to a remote node using TCP  ... 
doi:10.1002/cpe.1567 fatcat:i42eqg25kbdkxlbghenkovn7vy

Interpretation of Medical Imaging Data with a Mobile Application: A Mobile Digital Imaging Processing Environment

Meng Kuan Lin, Oliver Nicolini, Harald Waxenegger, Graham J. Galloway, Jeremy F. P. Ullmann, Andrew L. Janke
2013 Frontiers in Neurology  
The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation.  ...  Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing.  ...  ACKNOWLEDGMENTS This project is supported by the Australian National Data Service (ANDS).  ... 
doi:10.3389/fneur.2013.00085 pmid:23847587 pmcid:PMC3701154 fatcat:yeiqzyj6cfbylipwlgxq35kola

Two-level Dynamic Workflow Orchestration in the INDIGO DataCloud for Large-scale, Climate Change Data Analytics Experiments

Marcin Płóciennik, Sandro Fiore, Giacinto Donvito, Michał Owsiak, Marco Fargetta, Roberto Barbera, Riccardo Bruno, Emidio Giorgio, Dean N. Williams, Giovanni Aloisio
2016 Procedia Computer Science  
A climate change data analytics experiment use case regarding the precipitation trend analysis on CMIP5 data is described, that makes use of Kepler and big data analytics services.  ...  We are presenting the ongoing work on implementing the whole software chain on the Infrastructure as a Service, PaaS and Software as a Service layers, focusing on the scenarios involving scientific workflows  ...  parallel I/O strategies, data locality, processing chains optimization, etc.  ... 
doi:10.1016/j.procs.2016.05.359 fatcat:5pd4w5up5bhcvgt4bxns3h5jnu

The Norwegian National Ground Segment; Preservation, Distribution and Exploitation of Sentinel Data

Trygve Halsne, Lara Ferrighi, Bard Saadatnejad, Nico Budewitz, Frode Dinessen, Lars-Anders Breivik, Øystein Godøy
2019 Data Science Journal  
for seamless integration across branches.  ...  Due to the strong coupling between space based earth observations, in-situ observation, model data etc, disseminating data in a generic data management system utilizing NetCDF-4/CF and OPeNDAP is convenient  ...  Acknowledgements This research activity is supported by the Norwegian Space Agency from establishing the National Ground Segment ("Nasjonalt Bakkesegment", NBS) for satellite data in Norway.  ... 
doi:10.5334/dsj-2019-061 fatcat:lgf6txartnf7xpvqergrki4p7i

The DRIHM Project: A Flexible Approach to Integrate HPC, Grid and Cloud Resources for Hydro-Meteorological Research

Daniele Dagostino, Andrea Clematis, Antonella Galizia, Alfonso Quarati, Emanuele Danovaro, Luca Roverelli, Gabriele Zereik, Dieter Kranzlmuller, Michael Schiffers, Nils Gentschen Felde, Christian Straube, Olivier Caumontz (+9 others)
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
(models, data, and postprocessing tools) by exploiting HPC, Grid and Cloud facilities.  ...  The distributed research infrastructure for hydrometeorology (DRIHM) project focuses on the development of an e-Science infrastructure to provide end-to-end hydrometeorological research (HMR) services  ...  A complete HM chain. The integration bridges are represented with arrows. They implement data interchange standards to allow coupling of I/O data between models.  ... 
doi:10.1109/sc.2014.49 dblp:conf/sc/DAgostinoCGQDRZKSFSCRGHJDDFDP14 fatcat:xa7wiztihrd65leh745bmr6gzi

Land information system: An interoperable framework for high resolution land surface modeling

S KUMAR, C PETERSLIDARD, Y TIAN, P HOUSER, J GEIGER, S OLDEN, L LIGHTY, J EASTMAN, B DOTY, P DIRMEYER
2006 Environmental Modelling & Software  
The LIS components are designed using object oriented principles, with flexible, adaptable interfaces and modular structures for rapid prototyping and development.  ...  Land Information System (LIS) is a software framework that integrates the use of satellite and ground-based observational data along with advanced land surface models and computing tools to accurately  ...  The I/O tools also provide support for distributed data output and multiple formats.  ... 
doi:10.1016/j.envsoft.2005.07.004 fatcat:zkgsrs42lfclpii6pc6eg22kr4

ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna
2018 2018 IEEE International Conference on Big Data (Big Data)  
In this paper, we introduce a new array caching in hierarchical storage (ARCHIE) to accelerate array data analyses in a seamless fashion.  ...  Software libraries for managing this hierarchy not only need to read data efficiently, but also reduce user-involvement for cross-layer data movement.  ...  We have implemented parallel prefetching to accelerate parallel I/O of analysis applications.  ... 
doi:10.1109/bigdata.2018.8622616 dblp:conf/bigdataconf/DongWTKWB18 fatcat:tqb65jrrhnenjpe7ahc2omgq74

Pangeo ML - Open Source Tools and Pipelines for Scalable Machine Learning Using NASA Earth Observation Data

Joseph Hamman, Ryan Abernathey, David Hoese, James Bednar, Tom Augspurger
2020 figshare.com  
For these reasons, we do not state specific security considerations.  ...  The software involved in this project does not require user accounts to access nor does it collect usage telemetry (or other sensitive data).  ...  ); (3) out-of-core computation on datasets that do not fit into memory (via Dask, see Section 1.3.2); (4) a wide range of serialization and input/output (I/O) options such as NetCDF, OPeNDAP, GRIB, HDF  ... 
doi:10.6084/m9.figshare.13250252.v1 fatcat:pn4af7ye4fcwtm5svvit7jwryy
« Previous Showing results 1 — 15 out of 60 results