Filters








16,909 Hits in 9.0 sec

Compiler Optimization for Extreme-Scale Scripting

Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, Ian T. Foster
2014 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
The data-driven task parallelism execution model can support parallel programming models that are well suited for large-scale distributed-memory parallel computing, for example, simulations and analysis  ...  We describe a novel compiler intermediate representation and optimizations for this execution model, including adaptions of standard techniques alongside novel techniques.  ...  ACKNOWLEDGMENTS This research is supported in part by the U.S.  ... 
doi:10.1109/ccgrid.2014.115 dblp:conf/ccgrid/ArmstrongWWF14 fatcat:74numjfokbbj5kejpairnteyme

Common runtime support for high-performance parallel languages

1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing - Supercomputing '93  
We would design a common compiler data movement interface specification that will provide a set of communication standards that compilers can link into the runtime system for applications.  ...  Performance and Debugging Infrastructure for Compiler Runtime Systems Data parallelism and Task parallelism are two important kinds of exploitable parallelism available in most applications.  ... 
doi:10.1145/169627.169826 dblp:conf/sc/FoxRSMBCCCCEFFGHKKLLLOPPSSWY93 fatcat:uashkzwp65goxdjpj6jim4pjqy

Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications

Naila Farooqui, Christopher J. Rossbach, Yuan Yu, Karsten Schwan
2014 USENIX Symposium on Operating Systems Design and Implementation  
While GPUs enable order-of-magnitude performance increases in many data-parallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically  ...  with architecture-and runtime-specific parameters.  ...  All generated CUDA kernels are stored in the compilation cache to avoid runtime JIT compilation overheads for subsequent data chunks and application runs.  ... 
dblp:conf/osdi/FarooquiRYS14 fatcat:j6yjstgwffdqnbv23vi6koxgda

Compiler Techniques for Massively Scalable Implicit Task Parallelism

Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, Ian T. Foster
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
We present a comprehensive set of compiler techniques for data-driven task parallelism, including novel compiler optimizations and intermediate representations.  ...  Producing code that executes efficiently at this scale requires sophisticated compiler transformations: poorly optimized code inhibits scaling with excessive synchronization and communication.  ...  Computing resources were provided in part by NIH through the Computation Institute and the Biological Sciences Division of the University of Chicago and Argonne National Laboratory, under grant S10 RR029030  ... 
doi:10.1109/sc.2014.30 dblp:conf/sc/ArmstrongWWF14 fatcat:fn5th2mbljhavelopcxrrhljte

Optimizing Java-Specific Overheads: Java at the Speed of C? [chapter]

Ronald S. Veldema, Thilo Kielmann, Henri E. Bal
2001 Lecture Notes in Computer Science  
In this paper, we discuss four Java-specific code optimizations and their impact on application performance.  ...  We assess the execution time of three application kernels, comparing Manta with the IBM JIT 1.3.0, and with C-versions of the codes, compiled with GCC.  ...  We thank Rutger Hofman, Ceriel Jacobs, Jason Maassen, and Rob van Nieuwpoort for their contributions to the Manta compiler and runtime system.  ... 
doi:10.1007/3-540-48228-8_78 fatcat:a5zqdq4qprbarjuxjwiio3wfqq

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization [article]

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña
2021 arXiv   pre-print
The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications.  ...  This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler.  ...  We would like to acknowledge the NVIDIA AI Technology Center (NVAITC) Europe for their valuable help.  ... 
arXiv:2110.14340v1 fatcat:acfa6g7xm5dyfajen7fqkn4yri

Application Performance in the Frontera Acceptance Process

Richard Todd Evans
2020 Zenodo  
In 2017 the NSF called for proposals for a High Performance Computing System that would replace its largest system at the time, Blue Waters, located at the National Center for Supercomputing Applications  ...  In this paper, we present a candid accounting of the development of the application acceptance criteria used in the Texas Advanced Computing Center's proposal to this solicitation.  ...  ACKNOWLEDGMENTS This work is supported by the National Science Foundation through the ACI-1134872 Stampede, OAC-1540931 Stampede2, ACI-1953575 XSEDE, and OAC-1854828 Frontera awards.  ... 
doi:10.5281/zenodo.4313849 fatcat:3zdmy3udd5ab5d6w6bz72iubwm

PaRSEC: Exploiting Heterogeneity to Enhance Scalability

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, Jack J. Dongarra
2013 Computing in science & engineering (Print)  
Beginning in the 1970s, vector computing was indisputably the technology for those seeking the highest possible performance; in the 1980s, the introduction of multiprocessor vector systems added a new  ...  New high-performance computing system designs with steeply escalating processor and core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable memory access times call for dramatically  ...  Because our system's users are parallel application developers, the primary tools we created for interfacing with them are analysis and compilation tools.  ... 
doi:10.1109/mcse.2013.98 fatcat:qyghrkdwyjhs5porciopjjipi4

Hierarchical multithreading: programming model and system software

G.R. Gao, T. Sterling, R. Stevens, M. Hereld, Weirong Zhu
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
We will work on a dynamic compilation and runtime model to achieve efficient LITL-X program execution. Several adaptive optimizations will be studied.  ...  Finally, we plan to implement our method in an experimental testbed for a HEC architecture and perform a qualitative and quantitative evaluation on selected applications.  ...  We would also like to acknowledge other members at the CAPSL group, who provide a stimulus environment for scientific discussions and collaborations, in particular Ziang Hu, Juan del Cuvillo, and Ge Gan  ... 
doi:10.1109/ipdps.2006.1639574 dblp:conf/ipps/GaoSSHZ06 fatcat:lkcf3jbp7jcr3mikequlczalhu

A hybrid approach of OpenMP for clusters

Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel Midkiff
2012 SIGPLAN notices  
Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs.  ...  repetitive applications in the NAS Parallel Benchmarks.  ...  Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0720471-CNS, 0707931-CNS, 0833115-CCF, and 0916817-CCF.  ... 
doi:10.1145/2370036.2145827 fatcat:6rst36xk7zgznmervp7kty2h44

A hybrid approach of OpenMP for clusters

Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel Midkiff
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs.  ...  repetitive applications in the NAS Parallel Benchmarks.  ...  Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0720471-CNS, 0707931-CNS, 0833115-CCF, and 0916817-CCF.  ... 
doi:10.1145/2145816.2145827 dblp:conf/ppopp/KwonJEM12 fatcat:ryrox34v3bfqzf5y542vmwbdh4

Communication-Aware Parallelization Strategies for High Performance Applications

Imran Ashraf, Koen Bertels, Nader Khammassi, Jean-Christophe Le Lann
2015 2015 IEEE Computer Society Annual Symposium on VLSI  
With the advent of multicore processor architectures and the existence of a huge legacy code base, the need for efficient and scalable parallelizing compilers is growing.  ...  and exploiting it in a different way.  ...  The authors would like to thank Valery Kritchallo for useful discussions.  ... 
doi:10.1109/isvlsi.2015.89 dblp:conf/isvlsi/AshrafBKL15 fatcat:c4pb57i3kjflfgyt3guelloodq

Embedding GPU Computations in Hadoop

Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li
2014 International Journal of Networked and Distributed Computing (IJNDC)  
While Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones.  ...  MapReduce in Hadoop eases the programming task by hiding communication and scheduling details. Hadoop Distributed File System will help achieve data-level fault resilience.  ...  Hadoop and MapReduce Hadoop meets the challenges of Big Data by simplifying the programming process for data-intensive applications.  ... 
doi:10.2991/ijndc.2014.2.4.2 fatcat:kjdqk4iwgrdpdjccrooiz2vi74

Streamware

Jayanth Gummaraju, Joel Coburn, Yoshio Turner, Mendel Rosenblum
2008 Proceedings of the 13th international conference on Architectural support for programming languages and operating systems - ASPLOS XIII  
It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style.  ...  We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual  ...  Acknowledgements We would like to thank Timothy Barth of NASA Ames and Eric Darve of the Mechanical Engineering Department at Stanford University for their valuable help in providing applications and working  ... 
doi:10.1145/1346281.1346319 dblp:conf/asplos/GummarajuCTR08 fatcat:t5tvppkqgba67kvt44c22xtsg4

Streamware

Jayanth Gummaraju, Joel Coburn, Yoshio Turner, Mendel Rosenblum
2008 SIGPLAN notices  
It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style.  ...  We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual  ...  Acknowledgements We would like to thank Timothy Barth of NASA Ames and Eric Darve of the Mechanical Engineering Department at Stanford University for their valuable help in providing applications and working  ... 
doi:10.1145/1353536.1346319 fatcat:fxh6ulq4mzhbjb4ivun5tpf2oi
« Previous Showing results 1 — 15 out of 16,909 results