3,839 Hits in 6.8 sec

Acceleration of an analytical approach to collateralized debt obligation pricing

Dharmendra P. Gupta, Paul Chow
2010 Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '10  
A novel convolution approach based on FIFOs for storage is implemented for the recursive convolution. It is also used to address one of the main drawbacks of the analytical approach.  ...  demand to perform real-time risk analysis.  ...  MONTE CARLO EXECUTION TIME Bibliography  ... 
doi:10.1145/1723112.1723130 dblp:conf/fpga/GuptaC10 fatcat:cp3lyzdxfvho7lzstxrjhjh76y

Accelerating Geospatial Applications on Hybrid Architectures

Chenggang Lai, Miaoqing Huang, Xuan Shi, Haihang You
2013 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing  
This set of benchmarks include embarrassingly parallel application, loosely communicating application, and intensely communicating application.  ...  Accelerators have become critical in the process to develop supercomputers with exascale computing capability.  ...  ACKNOWLEDGMENTS This research used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI-0910735  ... 
doi:10.1109/hpcc.and.euc.2013.217 dblp:conf/hpcc/LaiHSY13 fatcat:4347tjinezctbjmcgzak5cgrdy

Exascale Machines Require New Programming Paradigms and Runtimes

2015 Supercomputing Frontiers and Innovations  
Therefore, new programming models and languages are required towards this direction. The whole point of view on application will have to change.  ...  Extreme scale parallel computing systems will have tens of thousands of optionally accelerator-equipped nodes with hundreds of cores each, as well as deep memory hierarchies and complex interconnect topologies  ...  In MPI it is based on the concept of a communication window to which the MPI processes in a communicator statically attach contiguous segments of their local memory for exposure to other processes; the  ... 
doi:10.14529/jsfi150201 fatcat:ozj4czefxrd37j7djcxuukyuee

Compiler-Optimized Simulation of Large-Scale Applications on High Performance Architectures

Vikram S Adve, Rajive Bagrodia, Ewa Deelman, Rizos Sakellariou
2002 Journal of Parallel and Distributed Computing  
For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation.  ...  Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator.  ...  ACKNOWLEDGMENT This work was supported by DARPA/ITO under Contract N66001-97-C-8533, "End-to-End Performance Modeling of Large Heterogeneous Adaptive Parallel/Distributed Computer/Communication Systems  ... 
doi:10.1006/jpdc.2001.1800 fatcat:ak4jasypyvflzn2ayae6rmj5s4

A case study in top-down performance estimation for a large-scale parallel application

Ilya Sharapov, Robert Kroeger, Guy Delamarter, Razvan Cheveresan, Matthew Ramsay
2006 Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06  
Lowlevel analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code.  ...  For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems.  ...  Acknowledgments The authors would like to thank the members of the Sun HPCS Performance team including Russ Brown, Lodewijk Bonebakker, Larry Brisson, John Busch, Chris Feucht, John Fredricksen, Ilya Gluhovsky  ... 
doi:10.1145/1122971.1122985 dblp:conf/ppopp/SharapovKDCR06 fatcat:2vuaiivpendazfoo5tymauu2ay

Scalable Overlapping Community Detection

Ismail El-Helw, Rutger Hofman, Wenzhe Li, Sungjin Ahn, Max Welling, Henri Bal
2016 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.  ...  However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs.  ...  We also thank SURFSara for granting us access to their HPC Cloud machine with 1TB main memory.  ... 
doi:10.1109/ipdpsw.2016.165 dblp:conf/ipps/El-HelwHLAWB16 fatcat:3qmxo3u4ufawbnckpjdisnwxpy

Programming big data analysis: principles and solutions

Loris Belcastro, Riccardo Cantini, Fabrizio Marozzo, Alessio Orsino, Domenico Talia, Paolo Trunfio
2022 Journal of Big Data  
Differently, this work analyzes and reviews parallel and distributed paradigms, languages and systems used today to analyze and learn from Big Data on scalable computers.  ...  New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from Big Data.  ...  Acknowledgements This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558.  ... 
doi:10.1186/s40537-021-00555-2 fatcat:enecmxiulnhrlhdly4ihsnoc64

Single Node On-Line Simulation of MPI Applications with SMPI

Pierre-Nicolas Clauss, Mark Stillwell, Stephane Genaud, Frederic Suter, Henri Casanova, Martin Quinson
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
Finally SMPI simulations of large-scale applications on largescale platforms can be executed on a single node thanks to techniques to reduce the simulation's compute time and memory footprint.  ...  It is also a way to teach the principles of parallel programming and high-performance computing to students without access to a parallel computer.  ...  PEVPM relies on extensive benchmarks of the target platform that provide probability distributions of communication times, which can in turn be used to model network contention phenomena.  ... 
doi:10.1109/ipdps.2011.69 dblp:conf/ipps/ClaussSGSCQ11 fatcat:tkbsxttokzhehgkdml3i5zcgp4

Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning

Federico Pittino, Pietro Bonfà, Andrea Bartolini, Fabio Affinito, Luca Benini, Carlo Cavazzoni
2019 Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '19  
Predicting the time to solution for massively parallel scientific codes is a complex task.  ...  A reliable prediction of execution time is however of great importance to the user who wants to plan on large scale simulations or virtual screening procedures characteristic of high throughput computing  ...  European Centre of Excellence in materials modelling, simulations, and designž (Grant No. 824143), by the European H2020 FET project OPRECOMP (g.a. 732631) and by the CINECA research grant on Energy-Efficient  ... 
doi:10.1145/3324989.3325720 dblp:conf/pasc/PittinoBBABC19 fatcat:n4x2owebmrgs7coudflrzkji4e

POEMS: end-to-end performance design of large parallel adaptive computational systems

M.K. Vernon, P.J. Teller, D.J. Sundaram-Stukel, R. Sakellariou, J.R. Rice, E.N. Houstis, A. Dube, E. Deelman, J.C. Browne, R. Bagrodia, V.S. Adve
2000 IEEE Transactions on Software Engineering  
AbstractÐThe POEMS project is creating an environment for end-to-end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating  ...  Sophisticated parallelizing compiler techniques allow this representation to be generated automatically for a given parallel program.  ...  Thanks also to Lawrence Livermore Laboratory for providing extensive computer time on the IBM SP/2.  ... 
doi:10.1109/32.881716 fatcat:w47k2yff2jde3jrnvcf3rrvidu

MPI hardware framework for many-core based embedded systems

Rodrigo Vinicius Mendonça Pereira, Laio Oriel Seman, Marcelo Daniel Berejuck, Douglas Rossi De Melo, Analucia Schiaffino Morales, Eduardo Augusto Bezerra
2021 International Journal of Sensor Networks (IJSNet)  
The proposal for an efficient MPI Hardware and MPI Software models, along with the presentation and evaluation of its queuing model, aims at giving the system design a framework to assist.  ...  Comparative results are presented between MPI in hardware and software such as silicon consumption, processing time and transfer rate of the system related to the size of buffers.  ...  Most of QT studies regarding NoCs have their analytical models based on Poisson arrival time distribution (λ) or memory-less (exponential distribution) package service time (τ ) distribution.  ... 
doi:10.1504/ijsnet.2021.112888 fatcat:2izsqwrniffwnep2oc2gr5mbwe

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study [article]

Ayesha Afzal, Georg Hager, Gerhard Wellein
2019 arXiv   pre-print
Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system.  ...  Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.  ...  Distributed-memory parallel applications require some communication model in addition, such as the Hockney model [5] , LogP [2] , or one of their variants and extensions.  ... 
arXiv:1905.10603v3 fatcat:p4jbdraivnakhio6qrn4fzldoi

Design and performance characterization of electronic structure calculations on massively parallel supercomputers: a case study of GPAW on the Blue Gene/P architecture

N.A. Romero, C. Glinsvad, A.H. Larsen, J. Enkovaara, S. Shende, V.A. Morozov, J.J. Mortensen
2013 Concurrency and Computation  
GPAW CODE ON MASSIVELY PARALLEL SUPERCOMPUTERS 13 (a) Wall-clock time breakdown (b) Normal probability plot at 1024 MPI tasks. Figure 3.  ...  GPAW is written in the Python and C programming languages [15] and uses the MPI [16, 17] programming model for parallel execution.  ...  ACKNOWLEDGMENTS We thank Marcin Dułak from the Center for Atomic-scale Materials Design (CAMd) for the initial porting of GPAW to the Blue Gene/P at the Argonne Leadership Computing Facility (ALCF).  ... 
doi:10.1002/cpe.3199 fatcat:utv6tnxjjvexhnyn6oq67an4ge

D8.1.4: Plan for Community Code Refactoring

Claudio Gheller, Will Sawyer
2012 Zenodo  
At the time of the submission of the current document, this community has identified the relevant applications and specified the main targets for code refactoring.  ...  For each code selected in the domains of Astrophysics, Material Science, Climate and Particle Physics, we provide a short summary of the algorithms to be the subject of refactoring.  ...  It is written in Fortran90, C and python and relies on MPI for parallel simulations.  ... 
doi:10.5281/zenodo.6572340 fatcat:mobdwam6kzgzvf2th46k7b66k4

Compiler-supported simulation of highly scalable parallel applications

Vikram S. Adve, Rajive Bagrodia, Ewa Deelman, Thomas Phan, Rizos Sakellariou
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
We use a compilersynthesized static task graph model to identify the control-flow and the subset of the computations that determine the parallelism, communication and synchronization of the code, and to  ...  Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator.  ...  This work was supported by DARPA/ITO under Contract N66001-97-C-8533, "End-to-End Performance Modeling of Large Heterogeneous Adaptive Parallel/Distributed Computer/Communication Systems," (  ... 
doi:10.1145/331532.331533 dblp:conf/sc/AdveBDPS99 fatcat:obdwb2s3uzdovmmxdmgnpjdkgm
« Previous Showing results 1 — 15 out of 3,839 results