A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
[chapter]
2004
Lecture Notes in Computer Science
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. ...
We call this special form of data locality, geographical locality, as one aspect of the non-uniformity is the physical distance between the cc-NUMA nodes. ...
Applications To evaluate different methods for improving geographical locality we study the performance of four solvers for large-scale partial differential equation (PDE) problems. ...
doi:10.1007/978-3-540-24687-9_2
fatcat:3bznmbesxvaepgwd2vb3h5gyfa
Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers
[chapter]
2008
Lecture Notes in Computer Science
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. ...
The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration ...
A main reason for the complexity of local address space implementations of AMR PDE solvers is that the programmer must explicitly control and modify the partitioning of work and data during execution. ...
doi:10.1007/978-3-540-68555-5_31
fatcat:5grtv5jzrzdkfpzbg3fghhitha
Analytics challenge---Remote runtime steering of integrated terascale simulation and visualization
2006
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06
The breakthrough is made possible by a new approach that visualizes partial differential equation (PDE) solution data simultaneously while a parallel PDE solver executes. ...
Because our approach avoids the bottlenecks associated with transferring and storing large volumes of output data, it offers a promising approach to overcoming the challenges of visualization of petascale ...
Acknowledgments We thank Michael Ogawa for helping with the production of the animation. We would also like to thank our SCEC CME partners Tom Jordan and Phil Maechling for their support and help. ...
doi:10.1145/1188455.1188767
dblp:conf/sc/TuYBGLMORSTU06
fatcat:7j3k5xbskraa7epdkj76xgpb24
Using Deep Learning to Extend the Range of Air-Pollution Monitoring and Forecasting
[article]
2019
arXiv
pre-print
Although the use of deep-learning techniques has been proposed, actual applications have been restricted by the fact the training data are obtained using traditional PDE solvers. ...
Across numerous applications, forecasting relies on numerical solvers for partial differential equations (PDEs). ...
In these latter studies, a solver for PDEs has been used to obtain hundreds of thousands of outputs corresponding to hundreds of thousands of inputs. ...
arXiv:1810.09425v2
fatcat:3t2lktgvxbhtniormfteiofkpi
Exa-Dune—Flexible PDE Solvers, Numerical Methods and Applications
[chapter]
2020
Lecture Notes in Computational Science and Engineering
In the EXA-DUNE project we have developed, implemented and optimised numerical algorithms and software for the scalable solution of partial differential equations (PDEs) on future exascale systems exhibiting ...
Continuous improvement of the underlying hardware-oriented numerical methods have included GPU-based sparse approximate inverses, matrix-free sum-factorisation for high-order discontinuous Galerkin discretisations ...
Our implementation shows good scalability until we reach a local problem size of just 18 cells, where we still need to improve the asynchronicity of ghost data communication and assembly. ...
doi:10.1007/978-3-030-47956-5_9
fatcat:iwfk3gsln5endcqe3uq42fzxwa
Applications for ultrascale computing
2015
Supercomputing Frontiers and Innovations
This article discusses the state-of-the-art of programming for current and future large-scale computing systems with an emphasis on complex applications. ...
Studies of real complex physical and engineering problems represented by multiscale and multiphysics computer simulations have an increasing demand for computing power. ...
Memory requirements include about 20 GB of HDD (for input and output data), and about 16 GB of RAM per node and about 8 GB of inter co-processor memory. ...
doi:10.14529/jsfi150102
fatcat:ffmlcygqyfatpamqvlrgjwjvaq
FUTURES-AMR: Towards an Adaptive Mesh Refinement Framework for Geosimulations
2018
International Conference Geographic Information Science
Geosimulations are scientific simulations using geographic data, routinely used to predict outcomes of urbanization in urban studies. ...
Adaptive Mesh Refinement (AMR) is a computational technique used to reduce the amount of computation and memory required in scientific simulations. ...
We design two simulation approaches namely, asynchronous AMR and synchronous AMR that vary in their implementation of Step 4. ...
doi:10.4230/lipics.giscience.2018.16
dblp:conf/giscience/ShashidharanVBM18
fatcat:ijlwsbiffvbe7htbro2usr2vkm
Parallel programming with message passing and directives
2001
Computing in science & engineering (Print)
models in the same program: distributed and shared memory. ...
Most high-performance systems use the Distributed Memory Parallel (DMP) and Shared Memory Parallel (SMP; also known as Symmetric MultiProcessor) models, and many applications can benefit from support for ...
We could improve this scheme by allocating in shared memory all data for communication, but we have not yet done this in SPECseis. ...
doi:10.1109/5992.947105
fatcat:of7laitsjnhz7exipyqhf7ehtq
ST-PCNN: Spatio-Temporal Physics-Coupled Neural Networks for Dynamics Forecasting
[article]
2021
arXiv
pre-print
One key characteristic of such systems is that certain physics laws -- represented as ordinary/partial differential equations (ODEs/PDEs) -- largely dominate the whole process, irrespective of time or ...
Physics-informed learning has recently emerged to learn physics for accurate prediction, but they often lack a mechanism to leverage localized spatial and temporal correlation or rely on hard-coded physics ...
The implementation was based on the Pytorch equipped with NVIDIA Geforce GTX 1080Ti and Titan Xp GPU with 32GB memory. ...
arXiv:2108.05940v1
fatcat:dnpncnjjqbgh7kywy6myrhpeti
D8.3: Re-integration into Community Codes
2013
Zenodo
The main focus of WP8 was the re-design and refactoring of a number of selected codes for scientific numerical applications, in order to effectively run on coming generations of supercomputing architectures ...
, maximising the impact of WP8. ...
Therefore, the OpenMP implementation has been developed to preserve data locality even inside the shared memory, and to effectively handle concurrent data accesses. ...
doi:10.5281/zenodo.6572421
fatcat:z6yjehkxrjdvnptuodtehjb6v4
D8.4.1: Plan for the further Refactoring of Selected Community Codes
2013
Zenodo
This document presents the work plan for the extension of Work Package 8 'Community Code Scaling', which focuses on the re-design and refactoring of a number of codes for scientific numerical applications ...
A subset of codes originally in WP8 that promise further improvements will take part to the extension. Thirteen codes were selected according to the proposed objectives and the available resources. ...
The main achievements are: Full shared memory OpenMP implementation of the hydro kernel, with good scalability up to 32 cores per CPU; Full shared memory OpenMP implementation of the gravity solver ...
doi:10.5281/zenodo.6572431
fatcat:5g6nafkizremdpnyxph25suih4
D9.3: Emerging Opportunities for Industrial Users of HPC
2013
Zenodo
A performance improvement of at least 30% was achieved with respect to original MPI code. ...
ViscoSolve - the solver was parallelized with a MPI/OpenMP hybridisation implementation. ...
D9.3 Emerging Opportunities for Industrial Users of HPC PRACE-2IP -RI-283493 22.8.2013 ...
doi:10.5281/zenodo.6572428
fatcat:3zmf2nn3vffhfgmvlmcpeutaxu
Recent Advances in Graph Partitioning
[article]
2015
arXiv
pre-print
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. ...
Acknowledgements We express our gratitude to Bruce Hendrickson, Dominique LaSalle, and George Karypis for many valuable comments on a preliminary draft of the manuscript. ...
When restricting local search to improving moves, parallelization is possible, though [KK99b,LK13,MSS15,ASS15]. In a shared memory context, one can also use speculative parallelism [SNBP11] . ...
arXiv:1311.3144v3
fatcat:zmvhlkh7ynbzvm353fv22f2gnq
Recent Advances in Graph Partitioning
[chapter]
2016
Lecture Notes in Computer Science
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. ...
Acknowledgements We express our gratitude to Bruce Hendrickson, Dominique LaSalle, and George Karypis for many valuable comments on a preliminary draft of the manuscript. ...
When restricting local search to improving moves, parallelization is possible, though [KK99b, LK13] . In a shared memory context, one can also use speculative parallelism [SNBP11] . ...
doi:10.1007/978-3-319-49487-6_4
fatcat:4zamxcmgvfbaxndjgxv6jog6km
Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs
[article]
2019
arXiv
pre-print
We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models. ...
The geographic extent, spatial heterogeneity, and multiple correlation length scales of the Hanford Site require training a computationally intensive GAN model to thousands of dimensions. ...
Again, even with a local batch size of 1, our medium network (designed for a stochastic dimension of 1000) would require over 19 GB of memory (once all intermediate results for backpropagation are accounted ...
arXiv:1910.13444v1
fatcat:lbse6emsm5dizpi4jdp7eokwvq
« Previous
Showing results 1 — 15 out of 237 results