Filters








361 Hits in 4.2 sec

Hints to improve automatic load balancing with LeWI for hybrid applications

Marta Garcia, Jesus Labarta, Julita Corbalan
2014 Journal of Parallel and Distributed Computing  
The DLB (Dynamic Load Balancing) library and LeWI (LEnd When Idle) algorithm provide a runtime solution to deal with the load imbalance of parallel applications independently of the source of imbalance  ...  This work is a deep analysis of the sources of efficiency loss correlated with application characteristics, parallelization schemes and programming models.  ...  LeWI (Lend When Idle) [1] is a load balancing algorithm that provides a runtime solution for load balancing problems of hybrid applications independently of the source of imbalance.  ... 
doi:10.1016/j.jpdc.2014.05.004 fatcat:noqletmngbfrliwjlscysxg5um

Compiler and runtime support for efficient software transactional memory

Ali-Reza Adl-Tabatabai, Brian T. Lewis, Vijay Menon, Brian R. Murphy, Bratin Saha, Tatiana Shpeisman
2006 SIGPLAN notices  
We present a highperformance software transactional memory system (STM) integrated into a managed runtime environment.  ...  This paper presents compiler and runtime optimizations for transactional memory language constructs.  ...  Acknowledgments We'd like to thank Dan Grossman and the anonymous reviewers for their feedback on this paper.  ... 
doi:10.1145/1133255.1133985 fatcat:74y2op54xrfvjgvnvuxn4ozk24

Compiler and runtime support for efficient software transactional memory

Ali-Reza Adl-Tabatabai, Brian T. Lewis, Vijay Menon, Brian R. Murphy, Bratin Saha, Tatiana Shpeisman
2006 Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06  
We present a highperformance software transactional memory system (STM) integrated into a managed runtime environment.  ...  This paper presents compiler and runtime optimizations for transactional memory language constructs.  ...  Acknowledgments We'd like to thank Dan Grossman and the anonymous reviewers for their feedback on this paper.  ... 
doi:10.1145/1133981.1133985 dblp:conf/pldi/Adl-TabatabaiLMMSS06 fatcat:p3khq7enrneatok732mqktw3ti

Design and evaluation of a compiler for embedded stream programs

Ryan R. Newton, Lewis D. Girod, Michael B. Craig, Samuel R. Madden, John Gregory Morrisett
2008 SIGPLAN notices  
We have used our language to build and deploy a sensornetwork for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot.  ...  Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace.  ...  Stream Graph Optimizations There are a breadth of well-understood transformations to static and dynamic dataflow graphs that adjust the parallelism within a graph-balancing load, exposing additional parallelism  ... 
doi:10.1145/1379023.1375675 fatcat:buh4kwpnkvccblfgrjy23z2jjm

GPU-accelerated simulations of isolated black holes

Adam G M Lewis, Harald P Pfeiffer
2018 Classical and quantum gravity  
Since this code must be maintained in parallel with SpEC itself, a primary design consideration is to perform as few explicit code changes as possible.  ...  We therefore rely on a hierarchy of automated porting strategies.  ...  Acknowledgments We thank Nils Deppe and Mark Scheel for helpful discussions. Calculations were performed with the SpEC-code [32].  ... 
doi:10.1088/1361-6382/aab256 fatcat:a47tr4i7hjf2tfszdwxpyippqy

Design and evaluation of a compiler for embedded stream programs

Ryan R. Newton, Lewis D. Girod, Michael B. Craig, Samuel R. Madden, John Gregory Morrisett
2008 Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES '08  
We have used our language to build and deploy a sensornetwork for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot.  ...  Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace.  ...  Stream Graph Optimizations There are a breadth of well-understood transformations to static and dynamic dataflow graphs that adjust the parallelism within a graph-balancing load, exposing additional parallelism  ... 
doi:10.1145/1375657.1375675 dblp:conf/lctrts/NewtonGCMM08 fatcat:vzuxt43dzbe37cgouwqimiyppe

ZPL: a machine independent programming language for parallel computers

B.L. Chamberlain, Sung-Eun Choi, C. Lewis, C. Lin, L. Snyder, W.D. Weathersby
2000 IEEE Transactions on Software Engineering  
The goal of producing architecture-independent parallel programs is complicated by the competing need for high performance.  ...  This paper describes ZPL and provides a comprehensive evaluation of the language with respect to its goals of performance, portability, and programming convenience.  ...  We thank the anonymous referees for their helpful comments.  ... 
doi:10.1109/32.842947 fatcat:phdanl2m6rgbzijh2nb2kq24sm

Robust sampling for weak lensing and clustering analyses with the Dark Energy Survey [article]

P. Lemos, N. Weaverdyck, R. P. Rollins, J. Muir, A. Ferté, A. R. Liddle, A. Campos, D. Huterer, M. Raveri, J. Zuntz, E. Di Valentino, X. Fang (+59 others)
2022 arXiv   pre-print
We determine that provides a good balance of speed and robustness, and recommend different settings for testing purposes and final chains for analyses with DES Y3 data.  ...  We find that the ellipsoidal nested sampling algorithm reports inconsistent estimates of the Bayesian evidence and somewhat narrower parameter credible intervals than the sliced nested sampling implemented  ...  Frieman), which is managed the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation.  ... 
arXiv:2202.08233v1 fatcat:fbg3ixw27vbihmx3dlydfiuro4

A compiler optimization algorithm for shared-memory multiprocessors

K.S. McKinley
1998 IEEE Transactions on Parallel and Distributed Systems  
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, sharedmemory multiprocessors.  ...  The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations.  ...  Conclusions This paper presents a new parallelization algorithm that balances parallelism and data locality.  ... 
doi:10.1109/71.706049 fatcat:3m5odkybzvgm3putgvlki3aznu

Cobaya: Code for Bayesian Analysis of hierarchical physical models [article]

Jesus Torrado, Antony Lewis
2021 arXiv   pre-print
novel algorithm.  ...  It can exploit hybrid OpenMP/MPI parallelization, and has sub-millisecond overhead per posterior evaluation.  ...  Applicability to PolyChord PolyChord [3, 4] is a nested sampling [38] algorithm that utilizes slice sampling [39] for sampling within isolikelihood contours.  ... 
arXiv:2005.05290v2 fatcat:v6bscunjb5gvrhvdkl4zsi7v4y

Where is software headed? A virtual roundtable

T. Lewis, D. Power, B. Meyer, J. Grimes, M. Potel, R. Vetter, P. Laplante, W. Pree, G. Pomberger, M.D. Hill, J.R. Larus, D.A. Wood (+1 others)
1995 Computer  
Languages such as HPF and runtime libraries such as the University of Maryland's CHAOS library for irregular applications implement a shared address space using compilers or runtime code.  ...  -Ted Lewis, Naval Postgraduate School GETTING SERIOUS. If parallel processing is to grow, it has to adapt to popular applications.  ... 
doi:10.1109/2.402054 fatcat:tfbfxmsew5ajppnpwof52gjyaa

Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

Kathryn S. McKinley
1994 Proceedings of the 8th international conference on Supercomputing - ICS '94  
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach.  ...  ., the compiler was required to use its analysis and algorithms to parallelize the program and could not rely on user assertions that for example, a loop is parallel.  ...  Acknowledgements I especially want to thank Ken Kennedy, who provided impetuous and guidance for much of this research.  ... 
doi:10.1145/181181.181265 dblp:conf/ics/McKinley94 fatcat:4eulalgo3rc5naym2lhb2wjl6i

Multi-dimensional intra-tile parallelization for memory-starved stencil computations [article]

Tareq Malas, Georg Hager, Hatem Ltaief, David Keyes
2015 arXiv   pre-print
We propose a flexible multi-dimensional intra-tile parallelization method for stencil algorithms on multicore CPUs with a shared outer-level cache.  ...  Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades.  ...  ACKNOWLEDGMENTS For computer time, this research used the resources of the Extreme Computing Research Center (ECRC) at KAUST. The authors thank the ECRC for supporting T. Malas.  ... 
arXiv:1510.04995v1 fatcat:twbfi3zicbe7bdu3hgn7d37h7q

Reconstructing Hardware Transactional Memory for Workload Optimized Systems [chapter]

Kunal Korgaonkar, Prabhat Jain, Deepak Tomar, Kashyap Garimella, Veezhinathan Kamakoti
2011 Lecture Notes in Computer Science  
This biennial event provides a forum for representing this community's research efforts and exchanging viewpoints.  ...  As an event that has taken place for 16 years, APPT aims at providing a high-quality program for all attendees. We accepted 13 papers out of 40 submissions, presenting an acceptance rate of 32.5%.  ...  The block parallel scheme with balance allocation algorithm achieve a speedup by a factor of 3.94x.  ... 
doi:10.1007/978-3-642-24151-2_1 fatcat:32cx745cn5cfdm5sbeah6eyiey

The OpenMOC method of characteristics neutral particle transport code

William Boyd, Samuel Shaner, Lulu Li, Benoit Forget, Kord Smith
2014 Annals of Nuclear Energy  
The OpenMOC code is being developed at the Massachusetts Institute of Technology to investigate algorithmic acceleration techniques and parallel algorithms for MOC.  ...  The method of characteristics (MOC) is a numerical integration technique for partial differential equations, and has seen widespread use for reactor physics lattice calculations.  ...  Acknowledgments The software design principles employed for OpenMOC are in large part inspired by the legacy left behind by Paul Romano on the MIT Computational Reactor Physics Group. The  ... 
doi:10.1016/j.anucene.2013.12.012 fatcat:fr6tuyl2hbf7nfv4bd37w72gpm
« Previous Showing results 1 — 15 out of 361 results