A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2003; you can also visit the original URL.
The file type is application/pdf
.
Filters
Maximizing multiprocessor performance with the SUIF compiler
1996
Computer
Using a vector architecture effectively involves parallelizing repeated arithmetic operations on large data streams-for example, the innermost loops in array-oriented programs. ...
To use a multiprocessor effectively, the compiler must exploit coarse-grain parallelism, locating large computations that can execute independently in parallel. ...
Acknowledgments This research was supported in part by the Air Force Materiel Command and ARPA contracts F30602-95-C-0098, DABT63-95-C-0118, and DABT63-94-C-0054; a Digital Equipment Corporation grant; ...
doi:10.1109/2.546613
fatcat:6x7urb56urbrho5ycgavfdxwte
Compilers for instruction-level parallelism
1997
Computer
Compilers use global knowledge of the application program not readily available to a hardware interpreter as well as a description of the target machine architecture to guide the machine-specific optimizations ...
I nstruction-level parallelism allows a sequence of instructions derived from a sequential program to be parallelized for execution on multiple pipelined functional units. ...
Compilers use global knowledge of the application program not readily available to a hardware interpreter as well as a description of the target machine architecture to guide the machine-specific optimizations ...
doi:10.1109/2.642817
fatcat:sqa3irdg3zcqzftmok3rpsv65a
Loop optimizations in C and C++ compilers: an overview
2020
Annales Mathematicae et Informaticae
In this paper, we give an overview of the scientific literature on loop optimization technology, and summarize the status of current implementations in the most widely used C and C++ compilers in the industry ...
Therefore we increasingly rely on compilers to do the heavy-lifting for us. A significant part of optimizations done by compilers are loop optimizations. ...
The publication of this paper is supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002). ...
doi:10.33039/ami.2020.07.003
fatcat:wnuup2mcbffotbcrl3cv6y2mhu
MATLAB Parallelization through Scalarization
2011
2011 15th Workshop on Interaction between Compilers and Computer Architectures
We have implemented this strategy in a MATLAB compiler that compiles portions of MATLAB to C++ or CUDA C. ...
Additional array temporaries are obviated in the case of array subscripts. ...
The notion of optimizing a program in steps, as more information becomes available, has been used in a version of ML, called MetaML, where it is called multi-staging [24] . ...
doi:10.1109/interact.2011.18
dblp:conf/IEEEinteract/SheiYRC11
fatcat:hmnywoccd5cjxawmkb6cw6hq7u
SUIF
1994
SIGPLAN notices
The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional ...
SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. ...
out bugs in the compiler, Karen Pieper for her help on the original SUIF system design, Mike Smith for his MIPS code generator, Todd Smith for his work on the translators between C and SUIF, and Michael ...
doi:10.1145/193209.193217
fatcat:yleymrlwuzfhzc2odlec6mv7si
Improving register allocation for subscripted variables
1990
Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation - PLDI '90
His accomplishments made him a pioneer not only of computer architecture but also in compiler optimization. ...
ACKNOWLEDGMENTS The authors would like to express our sincerest gratitude to the late John Cocke, who inspired and funded this work while at IBM. ...
This approach and its descendants have led to substantive, and in some cases dramatic, improvements in the performance of scientific programs on machines with long memory latencies. ...
doi:10.1145/93542.93553
dblp:conf/pldi/CallahanCK90
fatcat:w4wsph4nobddzbwp7grste6znm
ispc: A SPMD compiler for high-performance CPU programming
2012
2012 Innovative Parallel Computing (InPar)
Existing CPU parallel programming models focus primarily on multi-core parallelism, neglecting the substantial computational capabilities that are available in CPU SIMD vector units. ...
We have developed a compiler, the Intel R SPMD Program Compiler (ispc), that delivers very high performance on CPUs thanks to effective use of both multiple processor cores and SIMD vector units. ispc ...
Tim suggested the "SPMD on SIMD" terminology and has extensively argued for the advantages of the SPMD model. ...
doi:10.1109/inpar.2012.6339601
fatcat:nfbplb43jvgmjgxxbh27wftcn4
Exploiting Implicit Parallelism in Dynamic Array Programming Languages
2014
Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY'14
The J programming language includes the usual idioms of operations on arrays of the same size and shape, where the operations can often be performed in parallel for each individual item of the operands ...
The interpreter itself is responsible for exploiting the parallelism available in the applications. ...
In this section, we describe some of our optimizations that enable us to execute J programs on our interpreter efficiently in terms of execution time performance. ...
doi:10.1145/2627373.2627374
dblp:conf/pldi/ImamSLK14
fatcat:2lvchvw5xjhnvnuiw6imjopomy
Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture
[chapter]
2009
Lecture Notes in Computer Science
the programmer and compiler with an unfamiliar and difficult programming model. ...
In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective ...
Acknowledgments All authors from LBNL were supported by the Office of Advanced Scientific Computing Research in the DOE Office of Science under contract number DE-AC02-05CH11231. ...
doi:10.1007/978-3-642-00454-4_16
fatcat:kta3jonbpfh7ba47yrre6xaguq
FFT Compiler Techniques
[chapter]
2004
Lecture Notes in Computer Science
The floatingpoint performance of Fftw's scalar version has been more than doubled, resulting in the fastest FFT implementation to date. ...
, and IBM's SIMD operations implemented on the new processors of the BlueGene/L supercomputer. The paper introduces a special compiler backend for Intel P4's SSE 2 and AMD's 3DNow! ...
We would like to thank Matteo Frigo and Steven Johnson for many years of prospering cooperation and for making it possible for us to access non-public versions of Fftw. ...
doi:10.1007/978-3-540-24723-4_15
fatcat:iftd7t3c6rgengdgu2q6xjsxe4
Improving register allocation for subscripted variables
1990
SIGPLAN notices
His accomplishments made him a pioneer not only of computer architecture but also in compiler optimization. ...
ACKNOWLEDGMENTS The authors would like to express our sincerest gratitude to the late John Cocke, who inspired and funded this work while at IBM. ...
This approach and its descendants have led to substantive, and in some cases dramatic, improvements in the performance of scientific programs on machines with long memory latencies. ...
doi:10.1145/93548.93553
fatcat:ikzj44grtvb2xp5bzcvxiz2twe
Improving register allocation for subscripted variables
2004
SIGPLAN notices
His accomplishments made him a pioneer not only of computer architecture but also in compiler optimization. ...
ACKNOWLEDGMENTS The authors would like to express our sincerest gratitude to the late John Cocke, who inspired and funded this work while at IBM. ...
This approach and its descendants have led to substantive, and in some cases dramatic, improvements in the performance of scientific programs on machines with long memory latencies. ...
doi:10.1145/989393.989428
fatcat:okbavwnrsjcilm3siuatbpk5s4
Automatic and Interactive Program Parallelization Using the Cetus Source to Source Compiler Infrastructure v2.0
2022
Electronics
Cetus is used for research on compiler optimizations for multi-cores with an emphasis on automatic parallelization. ...
The compiler has gone through several iterations of benchmark studies and implementations of those techniques that could improve the parallel performance of these programs. ...
The effect of the various optimization techniques in Cetus on the performance of prior generations of benchmarks and machines is well documented [2, 3] . ...
doi:10.3390/electronics11050809
fatcat:f3djr5rigne3jp7a63tjh26tsi
Automatic Parallelization of Array-oriented Programs for a Multi-core Machine
2012
International journal of parallel programming
We present the work on automatic parallelization of array-oriented programs for multi-core machines. ...
Source programs written in standard APL are translated by a parallelizing APL-to-C compiler into parallelized C code, i.e. C mixed with OpenMP directives. ...
Alex Katz for his help in editing the manuscript to improve its English. We also thank referees for their help in improving this paper. ...
doi:10.1007/s10766-012-0197-6
fatcat:arjv5k7snngotbcqycehfay65m
An OpenMP Compiler Benchmark
2003
Scientific Programming
Six out of seven proposed optimization techniques are already implemented in different compilers. However, most compilers implement only one or two of them. ...
The purpose of this benchmark is to propose several optimization techniques and to test their existence in current OpenMP compilers. ...
Since the goal of parallel programming is to achieve higher performance, the further acceptance of OpenMP will strongly depend on compiler optimization techniques especially in the field where OpenMP has ...
doi:10.1155/2003/287461
fatcat:lwygisbdszcmjbef2r35hmgnne
« Previous
Showing results 1 — 15 out of 16,810 results