A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Acceleration of an analytical approach to collateralized debt obligation pricing
2010
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10
A novel convolution approach based on FIFOs for storage is implemented for the recursive convolution. It is also used to address one of the main drawbacks of the analytical approach. ...
demand to perform real-time risk analysis. ...
MONTE CARLO EXECUTION TIME
Bibliography ...
doi:10.1145/1723112.1723130
dblp:conf/fpga/GuptaC10
fatcat:cp3lyzdxfvho7lzstxrjhjh76y
Accelerating Geospatial Applications on Hybrid Architectures
2013
2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing
This set of benchmarks include embarrassingly parallel application, loosely communicating application, and intensely communicating application. ...
Accelerators have become critical in the process to develop supercomputers with exascale computing capability. ...
ACKNOWLEDGMENTS This research used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI-0910735 ...
doi:10.1109/hpcc.and.euc.2013.217
dblp:conf/hpcc/LaiHSY13
fatcat:4347tjinezctbjmcgzak5cgrdy
Exascale Machines Require New Programming Paradigms and Runtimes
2015
Supercomputing Frontiers and Innovations
Therefore, new programming models and languages are required towards this direction. The whole point of view on application will have to change. ...
Extreme scale parallel computing systems will have tens of thousands of optionally accelerator-equipped nodes with hundreds of cores each, as well as deep memory hierarchies and complex interconnect topologies ...
In MPI it is based on the concept of a communication window to which the MPI processes in a communicator statically attach contiguous segments of their local memory for exposure to other processes; the ...
doi:10.14529/jsfi150201
fatcat:ozj4czefxrd37j7djcxuukyuee
Compiler-Optimized Simulation of Large-Scale Applications on High Performance Architectures
2002
Journal of Parallel and Distributed Computing
For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation. ...
Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. ...
ACKNOWLEDGMENT This work was supported by DARPA/ITO under Contract N66001-97-C-8533, "End-to-End Performance Modeling of Large Heterogeneous Adaptive Parallel/Distributed Computer/Communication Systems ...
doi:10.1006/jpdc.2001.1800
fatcat:ak4jasypyvflzn2ayae6rmj5s4
A case study in top-down performance estimation for a large-scale parallel application
2006
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06
Lowlevel analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code. ...
For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems. ...
Acknowledgments The authors would like to thank the members of the Sun HPCS Performance team including Russ Brown, Lodewijk Bonebakker, Larry Brisson, John Busch, Chris Feucht, John Fredricksen, Ilya Gluhovsky ...
doi:10.1145/1122971.1122985
dblp:conf/ppopp/SharapovKDCR06
fatcat:2vuaiivpendazfoo5tymauu2ay
Scalable Overlapping Community Detection
2016
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale. ...
However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. ...
We also thank SURFSara for granting us access to their HPC Cloud machine with 1TB main memory. ...
doi:10.1109/ipdpsw.2016.165
dblp:conf/ipps/El-HelwHLAWB16
fatcat:3qmxo3u4ufawbnckpjdisnwxpy
Programming big data analysis: principles and solutions
2022
Journal of Big Data
Differently, this work analyzes and reviews parallel and distributed paradigms, languages and systems used today to analyze and learn from Big Data on scalable computers. ...
New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from Big Data. ...
Acknowledgements This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. ...
doi:10.1186/s40537-021-00555-2
fatcat:enecmxiulnhrlhdly4ihsnoc64
Single Node On-Line Simulation of MPI Applications with SMPI
2011
2011 IEEE International Parallel & Distributed Processing Symposium
Finally SMPI simulations of large-scale applications on largescale platforms can be executed on a single node thanks to techniques to reduce the simulation's compute time and memory footprint. ...
It is also a way to teach the principles of parallel programming and high-performance computing to students without access to a parallel computer. ...
PEVPM relies on extensive benchmarks of the target platform that provide probability distributions of communication times, which can in turn be used to model network contention phenomena. ...
doi:10.1109/ipdps.2011.69
dblp:conf/ipps/ClaussSGSCQ11
fatcat:tkbsxttokzhehgkdml3i5zcgp4
Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning
2019
Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '19
Predicting the time to solution for massively parallel scientific codes is a complex task. ...
A reliable prediction of execution time is however of great importance to the user who wants to plan on large scale simulations or virtual screening procedures characteristic of high throughput computing ...
European Centre of Excellence in materials modelling, simulations, and designž (Grant No. 824143), by the European H2020 FET project OPRECOMP (g.a. 732631) and by the CINECA research grant on Energy-Efficient ...
doi:10.1145/3324989.3325720
dblp:conf/pasc/PittinoBBABC19
fatcat:n4x2owebmrgs7coudflrzkji4e
POEMS: end-to-end performance design of large parallel adaptive computational systems
2000
IEEE Transactions on Software Engineering
AbstractÐThe POEMS project is creating an environment for end-to-end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating ...
Sophisticated parallelizing compiler techniques allow this representation to be generated automatically for a given parallel program. ...
Thanks also to Lawrence Livermore Laboratory for providing extensive computer time on the IBM SP/2. ...
doi:10.1109/32.881716
fatcat:w47k2yff2jde3jrnvcf3rrvidu
MPI hardware framework for many-core based embedded systems
2021
International Journal of Sensor Networks (IJSNet)
The proposal for an efficient MPI Hardware and MPI Software models, along with the presentation and evaluation of its queuing model, aims at giving the system design a framework to assist. ...
Comparative results are presented between MPI in hardware and software such as silicon consumption, processing time and transfer rate of the system related to the size of buffers. ...
Most of QT studies regarding NoCs have their analytical models based on Poisson arrival time distribution (λ) or memory-less (exponential distribution) package service time (τ ) distribution. ...
doi:10.1504/ijsnet.2021.112888
fatcat:2izsqwrniffwnep2oc2gr5mbwe
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
[article]
2019
arXiv
pre-print
Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. ...
Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications. ...
Distributed-memory parallel applications require some communication model in addition, such as the Hockney model [5] , LogP [2] , or one of their variants and extensions. ...
arXiv:1905.10603v3
fatcat:p4jbdraivnakhio6qrn4fzldoi
Design and performance characterization of electronic structure calculations on massively parallel supercomputers: a case study of GPAW on the Blue Gene/P architecture
2013
Concurrency and Computation
GPAW CODE ON MASSIVELY PARALLEL SUPERCOMPUTERS 13 (a) Wall-clock time breakdown (b) Normal probability plot at 1024 MPI tasks. Figure 3. ...
GPAW is written in the Python and C programming languages [15] and uses the MPI [16, 17] programming model for parallel execution. ...
ACKNOWLEDGMENTS We thank Marcin Dułak from the Center for Atomic-scale Materials Design (CAMd) for the initial porting of GPAW to the Blue Gene/P at the Argonne Leadership Computing Facility (ALCF). ...
doi:10.1002/cpe.3199
fatcat:utv6tnxjjvexhnyn6oq67an4ge
D8.1.4: Plan for Community Code Refactoring
2012
Zenodo
At the time of the submission of the current document, this community has identified the relevant applications and specified the main targets for code refactoring. ...
For each code selected in the domains of Astrophysics, Material Science, Climate and Particle Physics, we provide a short summary of the algorithms to be the subject of refactoring. ...
It is written in Fortran90, C and python and relies on MPI for parallel simulations. ...
doi:10.5281/zenodo.6572340
fatcat:mobdwam6kzgzvf2th46k7b66k4
Compiler-supported simulation of highly scalable parallel applications
1999
Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
We use a compilersynthesized static task graph model to identify the control-flow and the subset of the computations that determine the parallelism, communication and synchronization of the code, and to ...
Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. ...
This work was supported by DARPA/ITO under Contract N66001-97-C-8533, "End-to-End Performance Modeling of Large Heterogeneous Adaptive Parallel/Distributed Computer/Communication Systems," (http://www.cs.utexas.edu ...
doi:10.1145/331532.331533
dblp:conf/sc/AdveBDPS99
fatcat:obdwb2s3uzdovmmxdmgnpjdkgm
« Previous
Showing results 1 — 15 out of 3,839 results