A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors
[chapter]
1998
Lecture Notes in Computer Science
on out-of-core data. ...
A promising approach is to develop a language support and a compiler system on top of an advanced runtime system which can automatically transform an appropriate in-core program to efficiently operate ...
However, the efficient parallelization of irregular applications for distributed-memory multiprocessors (DMMPs) is still a challenging problem. ...
doi:10.1007/3-540-49530-4_25
fatcat:s2dxsxjb5jac3oyxlrc6qrzqwy
Automatic generation of application-specific accelerators for FPGAs from python loop nests
2012
22nd International Conference on Field Programmable Logic and Applications (FPL)
Design space exploration on the FPGA proceeds by varying the number of PEs in the system. Over four benchmark kernels, our system achieves 3× to 6× relative to soft-core C performance. ...
Our system applies traditional dependence analysis and reordering transformations to a restricted set of Python loop nests. ...
To support compilation on GPUs, Copperhead makes several restrictions required for compilation. ...
doi:10.1109/fpl.2012.6339372
dblp:conf/fpl/SheffieldAK12
fatcat:hphpwnv4uvdkxlwhptkr6p7ery
Scheduling Dynamic OpenMP Applications over Multicore Architectures
[chapter]
2008
Lecture Notes in Computer Science
We achieve a speedup of 14 on a 16-core machine with no application-level optimization. ...
data to the underlying runtime system, most OpenMP runtime systems are actually unable to efficiently support highly irregular, massively parallel applications on NUMA machines. ...
Parallel languages such as OpenMP, that rely on the combination of a dedicated compiler and a set of code annotations to extract the parallel structure of applications and to generate scheduling hints ...
doi:10.1007/978-3-540-79561-2_15
fatcat:n5pkgkq7jzhhpostmjt4xt4oje
Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures
[chapter]
2004
Lecture Notes in Computer Science
We compare a number of programming languages (Pthreads, OpenMP, MPI, UPC, Global Arrays) on both shared and distributed-memory architectures. ...
We evaluate the impact of programming language features on the performance of parallel applications on modern parallel architectures, particularly for the demanding case of sparse integer codes. ...
Our conclusion is that parallel applications requiring fine-grain accesses achieve poor performance on clusters regardless of the programming paradigm or language feature used, because the amount of inherent ...
doi:10.1007/978-3-540-24644-2_13
fatcat:js24djykkfhohk2gmc2m4dmbdu
OpenMP to GPGPU
2008
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09
regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial on a CPU). ...
This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. ...
Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0429535-CCF, CNS-0751153, and 0833115-CCF. ...
doi:10.1145/1504176.1504194
dblp:conf/ppopp/LeeME09
fatcat:7ru27sozu5h5hhlni4w4cdx6hi
OpenMP to GPGPU
2009
SIGPLAN notices
regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial on a CPU). ...
This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. ...
Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0429535-CCF, CNS-0751153, and 0833115-CCF. ...
doi:10.1145/1594835.1504194
fatcat:wbpl7ohbzffedndc6s6tafkfny
A Survey on Hardware and Software Support for Thread Level Parallelism
[article]
2016
arXiv
pre-print
We also review the programming models with respect to their support to shared-memory, distributed-memory and heterogeneity. ...
Todays computers are built upon multiple processing cores and run applications consisting of a large number of threads, making runtime thread management a complex process. ...
TRIPS supports TLP and DLP on a single threaded application using its four, 16-wide, out-of-order cores. ...
arXiv:1603.09274v3
fatcat:75isdvgp5zbhplocook6273sq4
Application Specific Customization and Scalability of Soft Multiprocessors
2009
2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Streamit -A compiler for stream-based applications Streamit [18] [20] is a high-level, architecture-independent language and compiler targeted at streaming applications. ...
Each processor requires less on-chip memory to store instructions and data for its application segment. We evaluate the impact of application granularity on on-chip memory later in Chapter 5. ...
doi:10.1109/fccm.2009.41
dblp:conf/fccm/UnnikrishnanZT09
fatcat:7cjy7ltl4rcyzlo7e2p4hdecdq
Harnessing Adaptivity Analysis for the Automatic Design of Efficient Embedded and HPC Systems
2013
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum
As a consequence, modern embedded systems exploit the potentialities of hundreds or thousands of processing units, often heterogeneous and physically distributed, which run in parallel on the many-core ...
Such a scheduling technique, called dynamic AC-scheduling, provides support for the High-Level Synthesis (HLS) of adaptive hardware cores. ...
GMT provides a set of features to address issues of irregular applications running on distributed memory architectures. ...
doi:10.1109/ipdpsw.2013.230
dblp:conf/ipps/LovergineF13
fatcat:vpdgybp2gnbmve6wzgscv6hqoa
Opposing to most simulators, JADE uses statistical models that follow distributions extracted from internal structures of the application, providing a more convenient and systematic exploration approach ...
JADE simulation features include detailed electrical and optical interconnections, detailed memory hierarchy infrastructure, and built-in energy analysis allowing studies of a broad spectrum of systems ...
Adjustable configurations of Electrical and Optical Network-on-Chip (NoC), memory hierarchy and coherence protocols are supported. We publicly release JADE, available online at [1] . ...
doi:10.1145/2857058.2857066
dblp:conf/hipeac/MaedaYWW0WLDW16
fatcat:ogxjh6ztovh2zbsycxt76dnctq
Automatic parallelization of irregular applications
2000
Parallel Computing
However, there is still a lack of convenient software support for implementing ecient parallel applications. ...
Both issues are dealt with in depth and in the context of sparse computations (for the ®rst issue) and irregular histogram reductions (for the second issue). Ó ...
Acknowledgements We gratefully thank David Padua, at the Department of Computer Science, University of Illinois at Urbana-Champaign, for providing us the Polaris compiler, and also Yuan Lin, for the kind ...
doi:10.1016/s0167-8191(00)00052-1
fatcat:vdi2bbfgyffu3i4vv62e5zkohm
From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype
[chapter]
2011
Lecture Notes in Computer Science
Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments ...
running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research. ...
Introduction This paper reports on our experience of designing and building an eight core cache-coherent shared-memory multiprocessor system on FPGA called BeeFarm to help investigate support for Transactional ...
doi:10.1007/978-3-642-19475-7_37
fatcat:eno4vzv2jrdqpjw6ytoqv56cdm
HOMPI: A Hybrid Programming Framework for Expressing and Deploying Task-Based Parallelism
[chapter]
2011
Lecture Notes in Computer Science
This paper presents hompi, a framework for programming and executing task-based parallel applications on clusters of multiprocessors and multi-cores, while providing interoperability with existing programming ...
systems such as mpi and OpenMP. hompi facilitates expressing irregular and adaptive master-worker and divide-and-conquer applications avoiding explicit mpi calls. ...
Conclusion This paper presents hompi, a directive-based programming and runtime environment for task-parallel applications on clusters of multiprocessor/multi-core nodes. ...
doi:10.1007/978-3-642-23397-5_3
fatcat:dsi3dgm32jg5rekwihbi52f3sm
A lock-free cache-friendly software queue buffer for decoupled software pipelining
2010
2010 International Computer Symposium (ICS2010)
However, its success relies on fast inter-core synchronization and communication. ...
A lock-free, cache-friendly solution need take two different aspects of memory system, memory coherence and memory consistency, into consideration. ...
ACKNOWLEDGMENT The work reported in this paper is partially supported by National Science Council, Taiwan, Republic of China, under grants NSC 96-2628-E-009-014-MY3, NSC 98-2220-E-009-050, and NSC 98-2220 ...
doi:10.1109/compsym.2010.5685364
fatcat:efsa7jj54ne3nlaz2m5o2l2onm
PROMPT [1] provides a new approach which relies on the co-operation of two technologies whose main strength consists in simultaneously taking into account regular and irregular aspects of telecom applications ...
Increasing of computation needs and improving of processor integration make the mapping of embedded real-time applications more and more expensive. ...
One of them is optimized to handle SIMD and regular aspects of SP applications and SOC, whereas the other one takes into account irregular and MIMD aspects required by the SOC and such applications. ...
doi:10.1145/354880.354887
dblp:conf/cases/BarreteauMGLSBK00
fatcat:oujnkk2tercbbnxupqclbtkofi
« Previous
Showing results 1 — 15 out of 1,103 results