Filters








1,858 Hits in 5.0 sec

Engineering parallel applications with tunable architectures

Christoph A. Schaefer, Victor Pankratius, Walter F. Tichy
2010 Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE '10  
Design, Implementation and Optimization Walter F. Tichy, Christoph A.  ...  Fuse groups of consecutive data-parallel stages Walter F. Tichy, Christoph A.  ...  Search-based auto-tuning system for library optimization ! Comprehensive analysis of search algorithms ! Not applicable for parallel programs  ... 
doi:10.1145/1806799.1806859 dblp:conf/icse/SchaeferPT10 fatcat:tyrdu562l5g4fgpodt63j45i6q

Atune-IL: An Instrumentation Language for Auto-tuning Parallel Applications [chapter]

Christoph A. Schaefer, Victor Pankratius, Walter F. Tichy
2009 Lecture Notes in Computer Science  
This paper concentrates on Atune-IL, an instrumentation language for specifying a wide range of tunable parameters for a generic auto-tuner.  ...  We extend auto-tuning to general-purpose parallel applications on multicores.  ...  As shown in Figure 10 , the data parallel section of module A 1 is nested in the master/worker section of stage 2, while the data parallel sections of A 5 and A 6 are nested in the Master/Worker section  ... 
doi:10.1007/978-3-642-03869-3_5 fatcat:ezlzz2usn5bk7njjwafrsmgusi

A programming system for future proofing performance critical libraries

Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak, Wen-mei Hwu
2016 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16  
We present Tangram, a programming system for writing performanceportable programs.  ...  The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work decomposition.  ...  Nested parallelism with algorithmic choice and recursive composition rules [3, 6] adapts programs to devices with different hierarchies.  ... 
doi:10.1145/2851141.2851178 dblp:conf/ppopp/ChangHKGDH16 fatcat:btc6btzamjco5fd6wy4lalrqti

A programming system for future proofing performance critical libraries

Li-Wen Chang, Izzat El Hajj, Hee-Seok Kim, Juan Gómez-Luna, Abdul Dakkak, Wen-mei Hwu
2016 SIGPLAN notices  
We present Tangram, a programming system for writing performanceportable programs.  ...  The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work decomposition.  ...  Nested parallelism with algorithmic choice and recursive composition rules [3, 6] adapts programs to devices with different hierarchies.  ... 
doi:10.1145/3016078.2851178 fatcat:pqpceoro6rc55cd63q4f5ghcnq

A tuning framework for software-managed memory hierarchies

Manman Ren, Ji Young Park, Mike Houston, Alex Aiken, William J. Dally
2008 Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08  
A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters  ...  Achieving good performance on a modern machine with a multi-level memory hierarchy, and in particular on a machine with software-managed memories, requires precise tuning of programs to the machine's particular  ...  The profiling system collects the same data for the loop nest at M0. In our experience, the profiling system gives the user valuable feedback.  ... 
doi:10.1145/1454115.1454155 dblp:conf/IEEEpact/RenPHAD08 fatcat:jjqa2v3gpvg4bmwqzapzlqu3hm

Auto-tuning full applications: A case study

Ananta Tiwari, Jeffrey K Hollingsworth, Chun Chen, Mary Hall, Chunhua Liao, Daniel J Quinlan, Jacqueline Chame
2011 The international journal of high performance computing applications  
We show that our system pinpoints a code variant that performs 2.37 times faster than the original loop nest.  ...  The values for these parameters are selected using a search-based auto-tuner, which performs a parallel heuristic search for the best-performing optimized variants of the outlined loop nests.  ...  It maintains a small data footprint for the sub-loop nests for cache optimization or to partition the computation across parallel threads.  ... 
doi:10.1177/1094342011414744 fatcat:5fkceunxxzdtnd4c26ohtnu264

Voodoo - a vector algebra for portable database performance on modern hardware

Holger Pirk, Oscar Moll, Matei Zaharia, Sam Madden
2016 Proceedings of the VLDB Endowment  
Such database performance engineering is hard: a plethora of data and hardware-dependent optimization techniques form a design space that is di cult to navigate for a skilled engineer -even more so for  ...  Central to our approach is a novel idea we termed control vectors, which allows a code generating frontend to expose parallelism to the Voodoo compiler in a abstract manner, enabling portable performance  ...  EVALUATION To evaluate our approach, we study the system with respect to our two design goals: portability and tunability.  ... 
doi:10.14778/3007328.3007336 fatcat:cpys5gp4lvgbrpblskhmob6auu

From sequential programming to flexible parallel execution

Arun Raman, Jae W. Lee, David I. August
2012 Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems - CASES '12  
This paper proposes to attract embedded systems programmers to a vertically integrated approach, comprising extensions to the sequential programming model, a parallelizing compiler, and an optimizing run-time  ...  To meet these challenges, clarion calls have been issued for programmers to start writing software in new parallel programming models.  ...  Lee was supported by the Korean IT R&D program of MKE/KEIT KI001810041244.  ... 
doi:10.1145/2380403.2380417 dblp:conf/cases/RamanLA12 fatcat:brrvp7hjqfebxhrdqijnvmlwp4

Sequoia: Programming the Memory Hierarchy

Kayvon Fatahalian, Timothy Knight, Mike Houston, Mattan Erez, Daniel Horn, Larkhoon Leem, Ji Park, Manman Ren, Alex Aiken, William Dally, Pat Hanrahan
2006 ACM/IEEE SC 2006 Conference (SC'06)  
We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy  ...  We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance  ...  In this algorithm, which features nested parallelism and a high degree of hierarchical data locality, parallel evaluation of submatrix multiplications is performed to compute the product of two large matrices  ... 
doi:10.1109/sc.2006.55 fatcat:tyvxi4sbcrbptjgry2o2p7ug2u

Memory---Sequoia

Kayvon Fatahalian, William J. Dally, Pat Hanrahan, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy  ...  We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance  ...  In this algorithm, which features nested parallelism and a high degree of hierarchical data locality, parallel evaluation of submatrix multiplications is performed to compute the product of two large matrices  ... 
doi:10.1145/1188455.1188543 dblp:conf/sc/FatahalianHKLHPERADH06 fatcat:vvqqskhcqbbopiv2q4ha5yvda4

Exposing Tunable Parameters in Multi-threaded Numerical Code [chapter]

Apan Qasem, Jichi Guo, Faizur Rahman, Qing Yi
2010 Lecture Notes in Computer Science  
This paper presents a systematic and extensive exploration of the combined search space of transformation parameters that affect both parallelism and data locality in multi-threaded numerical applications  ...  In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality.  ...  We presented a method for identifying and exposing tunable parameters to a search tool.  ... 
doi:10.1007/978-3-642-15672-4_6 fatcat:b5pgdieb4va7nlw6vrpktpikvm

Application-Level Energy Awareness for OpenMP [chapter]

Ferdinando Alessi, Peter Thoman, Giorgis Georgakoudis, Thomas Fahringer, Dimitrios S. Nikolopoulos
2015 Lecture Notes in Computer Science  
OpenMP is the de-facto standard for programming parallel shared memory systems, but does not yet provide any support for power control.  ...  We introduce OpenMPE, an extension to OpenMP designed for power management.  ...  Rather than designing a completely new interface, we opt for extending OpenMP API [16] , the de-facto standard for parallel shared-memory computing.  ... 
doi:10.1007/978-3-319-24595-9_16 fatcat:g33o723vhnhyfe5idiocbo3mty

Online Adaptive Code Generation and Tuning

Ananta Tiwari, Jeffrey K. Hollingsworth
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
In this paper, we present a runtime compilation and tuning framework for parallel programs.  ...  We evaluate our system on two parallel applications and show that our system can improve runtime execution by up to 46% compared to the original version of the program.  ...  . 3) Refinements of our parallel search algorithm to include a penalization technique. 4) A system design to support runtime code generation and code replacement for large scale parallel applications.  ... 
doi:10.1109/ipdps.2011.86 dblp:conf/ipps/TiwariH11 fatcat:eynpnq75mjebroc4yo7kpkwvhy

Policy-based tuning for performance portability and library co-optimization

Duane Merrill, Michael Garland, Andrew Grimshaw
2012 2012 Innovative Parallel Computing (InPar)  
In this paper, we present a policy-based design idiom for constructing reusable, tunable software components that can be co-optimized with the enclosing kernel for the specific problem and processor at  ...  Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon.  ...  The first pairs a parallelizing compiler with an autotuning framework for mapping sequential loop nests onto parallel hardware.  ... 
doi:10.1109/inpar.2012.6339597 fatcat:pvmge5vmbfaghcmnuc6bgesrzq

What Makes a Good Physical plan?

Holger Pirk, Oscar Moll, Sam Madden
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
In addition, the tools that are commonly used by performance engineers, such as compiler intrinsics, static analyzers or hardware performance counters are neither integrated with data management systems  ...  To address this problem, we developed a system called Candomblé that lets database performance engineers interactively examine, optimize and evaluate query plans using a touch-based interface.  ...  The penalty for branching selections is hardware as well as data dependent grated with the data management system but inherited from the programming language and the computer architecture.  ... 
doi:10.1145/2882903.2899410 dblp:conf/sigmod/PirkMM16 fatcat:nuvawyq6wzh7pg4vntupn747tu
« Previous Showing results 1 — 15 out of 1,858 results