Filters








237 Hits in 4.4 sec

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers [chapter]

Henrik Löf, Markus Nordén, Sverker Holmgren
2004 Lecture Notes in Computer Science  
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data.  ...  We call this special form of data locality, geographical locality, as one aspect of the non-uniformity is the physical distance between the cc-NUMA nodes.  ...  Applications To evaluate different methods for improving geographical locality we study the performance of four solvers for large-scale partial differential equation (PDE) problems.  ... 
doi:10.1007/978-3-540-24687-9_2 fatcat:3bznmbesxvaepgwd2vb3h5gyfa

Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers [chapter]

Markus Nordén, Henrik Löf, Jarmo Rantakokko, Sverker Holmgren
2008 Lecture Notes in Computer Science  
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality.  ...  The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration  ...  A main reason for the complexity of local address space implementations of AMR PDE solvers is that the programmer must explicitly control and modify the partitioning of work and data during execution.  ... 
doi:10.1007/978-3-540-68555-5_31 fatcat:5grtv5jzrzdkfpzbg3fghhitha

Analytics challenge---Remote runtime steering of integrated terascale simulation and visualization

Tiankai Tu, Ricardo Taborda-Rios, John Urbanic, Hongfeng Yu, Jacobo Bielak, Omar Ghattas, Julio C. Lopez, Kwan-Liu Ma, David R. O'Hallaron, Leonardo Ramirez-Guzman, Nathan Stone
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
The breakthrough is made possible by a new approach that visualizes partial differential equation (PDE) solution data simultaneously while a parallel PDE solver executes.  ...  Because our approach avoids the bottlenecks associated with transferring and storing large volumes of output data, it offers a promising approach to overcoming the challenges of visualization of petascale  ...  Acknowledgments We thank Michael Ogawa for helping with the production of the animation. We would also like to thank our SCEC CME partners Tom Jordan and Phil Maechling for their support and help.  ... 
doi:10.1145/1188455.1188767 dblp:conf/sc/TuYBGLMORSTU06 fatcat:7j3k5xbskraa7epdkj76xgpb24

Using Deep Learning to Extend the Range of Air-Pollution Monitoring and Forecasting [article]

Philipp Haehnel, Jakub Marecek, Julien Monteil, Fearghal O'Donncha
2019 arXiv   pre-print
Although the use of deep-learning techniques has been proposed, actual applications have been restricted by the fact the training data are obtained using traditional PDE solvers.  ...  Across numerous applications, forecasting relies on numerical solvers for partial differential equations (PDEs).  ...  In these latter studies, a solver for PDEs has been used to obtain hundreds of thousands of outputs corresponding to hundreds of thousands of inputs.  ... 
arXiv:1810.09425v2 fatcat:3t2lktgvxbhtniormfteiofkpi

Exa-Dune—Flexible PDE Solvers, Numerical Methods and Applications [chapter]

Peter Bastian, Mirco Altenbernd, Nils-Arne Dreier, Christian Engwer, Jorrit Fahlke, René Fritze, Markus Geveler, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, Jan Mohring, Steffen Müthing (+4 others)
2020 Lecture Notes in Computational Science and Engineering  
In the EXA-DUNE project we have developed, implemented and optimised numerical algorithms and software for the scalable solution of partial differential equations (PDEs) on future exascale systems exhibiting  ...  Continuous improvement of the underlying hardware-oriented numerical methods have included GPU-based sparse approximate inverses, matrix-free sum-factorisation for high-order discontinuous Galerkin discretisations  ...  Our implementation shows good scalability until we reach a local problem size of just 18 cells, where we still need to improve the asynchronicity of ghost data communication and assembly.  ... 
doi:10.1007/978-3-030-47956-5_9 fatcat:iwfk3gsln5endcqe3uq42fzxwa

Applications for ultrascale computing

2015 Supercomputing Frontiers and Innovations  
This article discusses the state-of-the-art of programming for current and future large-scale computing systems with an emphasis on complex applications.  ...  Studies of real complex physical and engineering problems represented by multiscale and multiphysics computer simulations have an increasing demand for computing power.  ...  Memory requirements include about 20 GB of HDD (for input and output data), and about 16 GB of RAM per node and about 8 GB of inter co-processor memory.  ... 
doi:10.14529/jsfi150102 fatcat:ffmlcygqyfatpamqvlrgjwjvaq

FUTURES-AMR: Towards an Adaptive Mesh Refinement Framework for Geosimulations

Ashwin Shashidharan, Ranga Raju Vatsavai, Derek B. Van Berkel, Ross K. Meentemeyer, Michael Wagner
2018 International Conference Geographic Information Science  
Geosimulations are scientific simulations using geographic data, routinely used to predict outcomes of urbanization in urban studies.  ...  Adaptive Mesh Refinement (AMR) is a computational technique used to reduce the amount of computation and memory required in scientific simulations.  ...  We design two simulation approaches namely, asynchronous AMR and synchronous AMR that vary in their implementation of Step 4.  ... 
doi:10.4230/lipics.giscience.2018.16 dblp:conf/giscience/ShashidharanVBM18 fatcat:ijlwsbiffvbe7htbro2usr2vkm

Parallel programming with message passing and directives

S.W. Bova, C.P. Breshears, H. Gabb, B. Kuhn, B. Magro, R. Eigenmann, G. Gaertner, S. Salvini, H. Scott
2001 Computing in science & engineering (Print)  
models in the same program: distributed and shared memory.  ...  Most high-performance systems use the Distributed Memory Parallel (DMP) and Shared Memory Parallel (SMP; also known as Symmetric MultiProcessor) models, and many applications can benefit from support for  ...  We could improve this scheme by allocating in shared memory all data for communication, but we have not yet done this in SPECseis.  ... 
doi:10.1109/5992.947105 fatcat:of7laitsjnhz7exipyqhf7ehtq

ST-PCNN: Spatio-Temporal Physics-Coupled Neural Networks for Dynamics Forecasting [article]

Yu Huang, James Li, Min Shi, Hanqi Zhuang, Xingquan Zhu, Laurent Chérubin, James VanZwieten, Yufei Tang
2021 arXiv   pre-print
One key characteristic of such systems is that certain physics laws -- represented as ordinary/partial differential equations (ODEs/PDEs) -- largely dominate the whole process, irrespective of time or  ...  Physics-informed learning has recently emerged to learn physics for accurate prediction, but they often lack a mechanism to leverage localized spatial and temporal correlation or rely on hard-coded physics  ...  The implementation was based on the Pytorch equipped with NVIDIA Geforce GTX 1080Ti and Titan Xp GPU with 32GB memory.  ... 
arXiv:2108.05940v1 fatcat:dnpncnjjqbgh7kywy6myrhpeti

D8.3: Re-integration into Community Codes

Claudio Gheller
2013 Zenodo  
The main focus of WP8 was the re-design and refactoring of a number of selected codes for scientific numerical applications, in order to effectively run on coming generations of supercomputing architectures  ...  , maximising the impact of WP8.  ...  Therefore, the OpenMP implementation has been developed to preserve data locality even inside the shared memory, and to effectively handle concurrent data accesses.  ... 
doi:10.5281/zenodo.6572421 fatcat:z6yjehkxrjdvnptuodtehjb6v4

D8.4.1: Plan for the further Refactoring of Selected Community Codes

Claudio Gheller
2013 Zenodo  
This document presents the work plan for the extension of Work Package 8 'Community Code Scaling', which focuses on the re-design and refactoring of a number of codes for scientific numerical applications  ...  A subset of codes originally in WP8 that promise further improvements will take part to the extension. Thirteen codes were selected according to the proposed objectives and the available resources.  ...  The main achievements are:  Full shared memory OpenMP implementation of the hydro kernel, with good scalability up to 32 cores per CPU;  Full shared memory OpenMP implementation of the gravity solver  ... 
doi:10.5281/zenodo.6572431 fatcat:5g6nafkizremdpnyxph25suih4

D9.3: Emerging Opportunities for Industrial Users of HPC

Claudio Arlandini
2013 Zenodo  
A performance improvement of at least 30% was achieved with respect to original MPI code.  ...  ViscoSolve - the solver was parallelized with a MPI/OpenMP hybridisation implementation.  ...  D9.3 Emerging Opportunities for Industrial Users of HPC PRACE-2IP -RI-283493 22.8.2013  ... 
doi:10.5281/zenodo.6572428 fatcat:3zmf2nn3vffhfgmvlmcpeutaxu

Recent Advances in Graph Partitioning [article]

Aydin Buluc, Henning Meyerhenke, Ilya Safro, Peter Sanders, Christian Schulz
2015 arXiv   pre-print
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.  ...  Acknowledgements We express our gratitude to Bruce Hendrickson, Dominique LaSalle, and George Karypis for many valuable comments on a preliminary draft of the manuscript.  ...  When restricting local search to improving moves, parallelization is possible, though [KK99b,LK13,MSS15,ASS15]. In a shared memory context, one can also use speculative parallelism [SNBP11] .  ... 
arXiv:1311.3144v3 fatcat:zmvhlkh7ynbzvm353fv22f2gnq

Recent Advances in Graph Partitioning [chapter]

Aydın Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, Christian Schulz
2016 Lecture Notes in Computer Science  
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.  ...  Acknowledgements We express our gratitude to Bruce Hendrickson, Dominique LaSalle, and George Karypis for many valuable comments on a preliminary draft of the manuscript.  ...  When restricting local search to improving moves, parallelization is possible, though [KK99b, LK13] . In a shared memory context, one can also use speculative parallelism [SNBP11] .  ... 
doi:10.1007/978-3-319-49487-6_4 fatcat:4zamxcmgvfbaxndjgxv6jog6km

Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs [article]

Liu Yang and Sean Treichler and Thorsten Kurth and Keno Fischer and David Barajas-Solano and Josh Romero and Valentin Churavy and Alexandre Tartakovsky and Michael Houston and Prabhat and George Karniadakis
2019 arXiv   pre-print
We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models.  ...  The geographic extent, spatial heterogeneity, and multiple correlation length scales of the Hanford Site require training a computationally intensive GAN model to thousands of dimensions.  ...  Again, even with a local batch size of 1, our medium network (designed for a stochastic dimension of 1000) would require over 19 GB of memory (once all intermediate results for backpropagation are accounted  ... 
arXiv:1910.13444v1 fatcat:lbse6emsm5dizpi4jdp7eokwvq
« Previous Showing results 1 — 15 out of 237 results