Filters








222,496 Hits in 2.9 sec

Enabling Primitives for Compiling Parallel Languages [chapter]

Seth Copen Goldstein, Klaus Erik Schauser, David Culler
1996 Languages, Compilers and Run-Time Systems for Scalable Computers  
Excess parallelism degrades into sequential calls with the attendant efficient stack management and direct transfer of control and data, unless a call truly needs to execute in parallel, in which case  ...  This paper presents three novel language implementation primitives-lazy threads,stacklets, and synchronizers-and shows how they combine to provide a parallel call at nearly the efficiency of a sequential  ...  Acknowledgements We are grateful to the anonymous referees and participants at the workshop for their valuable comments. We would also like to thank Manuel Faehndrich, Urs H olzle,  ... 
doi:10.1007/978-1-4615-2315-4_12 fatcat:yw2j6h4zo5fvfh7hxtiwqwszjm

Compiler techniques for the distribution of data and computation

A. Navarro, E. Zapata, D. Padua
2003 IEEE Transactions on Parallel and Distributed Systems  
This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance  ...  the parallel code with the automatically selected iteration and data distributions.  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous referees for their helpful and insightful suggestions.  ... 
doi:10.1109/tpds.2003.1206503 fatcat:yrlxyxmxxzhbvntruqzj7jzqyu

Using knowledge-based systems for research on parallelizing compilers

Chao-Tung Yang, Shian-Shyong Tseng, Yun-Woei Fann, Ting-Ku Tsai, Ming-Huei Hsieh, Cheng-Tien Wu
2001 Concurrency and Computation  
This article describes the design and implementation of an efficient parallelizing compiler to parallelize loops and achieve high speedup rates on multiprocessor systems.  ...  One of the ultimate goals is to construct a high-performance and portable FORTRAN parallelizing compiler on shared-memory  ...  NSC86-2213-E009-081 and NSC87-2213-E009-023.  ... 
doi:10.1002/cpe.563 fatcat:z5wil2dh6bexdgdvffrvokqqvq

Integrating Data and Task Parallelism in Scientific Programs [chapter]

Ewa Deelman, Wesley K. Kaplow, Boleslaw K. Szymanski, Peter Tannenbaum, Louis Ziantz
1996 Languages, Compilers and Run-Time Systems for Scalable Computers  
Functional languages attract the attention of developers of parallelizing compilers because of the implicit parallelism of functional programs and the simplified data dependence analysis of functional  ...  In this paper we explore the connection between the memory optimization and communication optimization of parallel codes generated from functional languages.  ...  We are developing a system that automatically transforms serial FORTRAN into parallel C, performing memory optimization and introducing data and task parallelism (see Figure 1 ).  ... 
doi:10.1007/978-1-4615-2315-4_13 fatcat:tba5x22atrbgng3fmt3bm3zpqi

A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

M. Kandemir, A. Choudhary, J. Ramanujam, M. Kandaswamy
1997 Proceedings of the fifth workshop on I/O in parallel and distributed systems - IOPADS '97  
This approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality.  ...  This paper presents compiler algorithms to optimize outof-core programs. These algorithms consider loop and data layout transformations in a tied framework.  ...  A computation which operates on disk-resident data sets is called out-of-core, and an optimizing compiler for out-of-core computations is called an out-of-core compiler.  ... 
doi:10.1145/266220.266228 dblp:conf/iopads/KandemirCRK97 fatcat:udbihlxh5vcwdatqpecxlo53hm

Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework

Qiang Liu, G.A. Constantinides, K. Masselos, P. Cheung
2009 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
FPGA-targeted hardware compilation.  ...  However, the exploitation of both data reuse and parallelization is limited by the memory resources available onchip.  ...  As a result, this exploration problem is automated and system designs with optimal performance are determined at compile time. III.  ... 
doi:10.1109/tcad.2009.2013541 fatcat:quimqfqulvfafixz5f43ph73se

Compiler Management of Communication and Parallelism for Quantum Computation

Jeff Heckey, Shruti Patil, Ali JavadiAbhari, Adam Holmes, Daniel Kudrow, Kenneth R. Brown, Diana Franklin, Frederic T. Chong, Margaret Martonosi
2015 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '15  
Our work is the most comprehensive software-to-quantum toolflow published to date, with efficient and practical scheduling techniques that reduce communication and increase parallelism for full-scale quantum  ...  Quantum computing (QC) offers huge promise to accelerate a range of computationally intensive benchmarks.  ...  Many researchers have contributed to our benchmarks, including John Black, Lukas Svec, Aram Harrow, Amlan Chakrabarti, Chen-Fu Chiang, Oana Catu and Mohammad Javad Dousti.  ... 
doi:10.1145/2694344.2694357 dblp:conf/asplos/HeckeyPJHKBFCM15 fatcat:h5w7skxobrgfvkatjafy7abfdq

Automated Precision Tuning in Activity Classification Systems

Nicola Fossati, Daniele Cattaneo, Michele Chiari, Stefano Cherubin, Giovanni Agosta
2020 Proceedings of the 11th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures / 9th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms  
CCS CONCEPTS • Hardware → Power estimation and optimization; • Software and its engineering → Compilers; • Applied computing → Consumer health.  ...  However, wearable devices have limited computational power and battery life.  ...  We target very small computing systems, such as microcontrollers. Such systems are employed in real world scenarios for activity detection in wearable devices and internet-of-things applications.  ... 
doi:10.1145/3381427.3381432 dblp:conf/hipeac/FossatiCCCA20 fatcat:sncrfcxg4vdolb6ymxflacpnvy

Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems

Giuseppe Massari, Chiara Caffarri, Patrick Bellasi, William Fornaciari
2014 Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '14  
From Mobile to High-Performance Computing (HPC) systems, performance and energy efficiency are becoming always more challenging requirements.  ...  Roughly speaking, this concept is usually defined in computing systems as the ratio between performance achieved and power consumed for that purpose.  ...  ., pre-defined generic components that capture, organize and mask to the user all the details involved in the parallel computation structure, that are not relevant to the user code.  ... 
doi:10.1145/2556863.2556868 dblp:conf/hipeac/MassariCBF14 fatcat:c2we6zvxvfbtrmlutz3scjo36y

Runtime Address Space Computation for SDSM Systems [chapter]

Jairo Balart, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, Jesús Labarta
Languages and Compilers for Parallel Computing  
Parallel loops are the target of the compiler, looking for statements where the set of memory references can be grouped and then served with a single communication action [4] .  ...  UPC implementations [2] [3] perform address space monitoring through a deep coordination of the compiler and the runtime system.  ...  Acknowledgements This work has been supported by the Ministry of Education of Spain under contract CICYT-TIN2004-07739-C02-01, and the Barcelona Supercomputing Center.  ... 
doi:10.1007/978-3-540-72521-3_24 dblp:conf/lcpc/BalartGMAL06 fatcat:4zvjx4ttjraprjlglguyxfktxy

A Scalable Algorithm for Compiler-Placed Staggered Checkpointing

Alison N. Norman, Calvin Lin
2011 Parallel and Distributed Computing and Systems   unpublished
for the network and file system.  ...  system contention and thus checkpoint overhead.  ...  In addition, the authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported in  ... 
doi:10.2316/p.2011.757-107 fatcat:hssvgyhvkzddvmccmqabkw7ccu

A Scalable Algorithm for Compiler-Placed Staggered Checkpointing

Alison N. Norman, Calvin Lin
2012 Parallel and Distributed Computing and Systems   unpublished
for the network and file system.  ...  system contention and thus checkpoint overhead.  ...  In addition, the authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported in  ... 
doi:10.2316/p.2012.757-107 fatcat:wcpffkcw4bhjblaibl4zghsbxe

Computing feedback laws for linear systems with a parallel pieri homotopy

J. Verschelde, Yusong Wang
Workshops on Mobile and Wireless Networking/High Performance Scientific, Engineering Computing/Network Design and Architecture/Optical Networks Control and Management/Ad Hoc and Sensor Networks/Compile and Run Time Techniques for Parallel Computing ICPP 2004  
Homotopy methods to solve polynomial systems are well suited for parallel computing because the solution paths defined by the homotopy can be tracked independently.  ...  We studied the parallelization of Pieri homotopies to compute all feedback laws to control linear systems. To distribute the workload, we mapped the poset onto a tree.  ...  This material is based upon work supported by the National Science Foundation under Grant No. 0105739 and Grant No. 0134611.  ... 
doi:10.1109/icppw.2004.1328021 dblp:conf/icppw/VerscheldeW04 fatcat:t2eifw6hhrfn5lrfad6s5yb7fy

Investigation of turbulent melt flow in a crystal growth system

Dali Wang, J.E. Flaherty, K.E. Jansen, M. Shepherd
Workshops on Mobile and Wireless Networking/High Performance Scientific, Engineering Computing/Network Design and Architecture/Optical Networks Control and Management/Ad Hoc and Sensor Networks/Compile and Run Time Techniques for Parallel Computing ICPP 2004  
We further interpolated threedimensional solutions onto a two-dimensional unstructured mesh and computed the Reynolds average mean flow and its fluctuations.  ...  With powerful parallel computers, this can be advanced further to enhance and control these complex systems.  ...  The intensity of computation can be mollified by adaptive mesh refinement and coarsening with dynamic loading balancing of the parallel computation [14] .  ... 
doi:10.1109/icppw.2004.1328020 dblp:conf/icppw/WangFJS04 fatcat:vskoqenjgjbbtd6ojrxhiywgwu

TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir [article]

Tao B. Schardl, Siddharth Samsi
2019 arXiv   pre-print
TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation.  ...  But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation.  ...  By incorporating Tapir, an ML compiler can perform low-level optimizations on parallel computation that can benefit all hardware platforms and parallel runtime systems.  ... 
arXiv:1908.11338v1 fatcat:fpe4tmadwnbd3adplv4gfjbrpe
« Previous Showing results 1 — 15 out of 222,496 results