Filters








1,185 Hits in 7.9 sec

Compiler Optimization for Superscalar Systems: Global Instruction Scheduling without Copies

Philip H. Sweany, Steve Carr, Brett L. Huber
1998 Digital technical journal of Digital Equipment Corporation  
It is an optimization not usually found in compilers for non-ILP architectures.  ...  Instruction scheduling is classified as local if it considers code only within a basic block and global if it schedules code across multiple basic blocks.  ...  The performance of instruction-level parallel systems can be improved by compiler programs that order machine operations to increase system parallelism and reduce execution time.  ... 
dblp:journals/dtj/SweanyCH98 fatcat:ppzzymiicjbu7mc6jgo5cnbj4y

Bounds modelling and compiler optimizations for superscalar performance tuning

Pradip Bose, Sunil Kim, Francis P O'Connell, William A Ciarfella
1999 Journal of systems architecture  
In particular, we illustrate the use of this analysis in suggesting loop unrolling and scheduling heuristics.  ...  We consider the¯oating point microarchitecture support in RISC superscalar processors. We brie¯y review the fundamental performance trade-os in the design of such microarchitecutres.  ...  In this paper, we consider the problem of loop transformation and instruction scheduling for performance tuning of high-end superscalar, RISC machines.  ... 
doi:10.1016/s1383-7621(98)00053-8 fatcat:rvqysiv5gzb6zmbrvffoqcgafi

Available instruction-level parallelism for superscalar and superpipelined machines

N. P. Jouppi, D. W. Wall
1989 Proceedings of the third international conference on Architectural support for programming languages and operating systems - ASPLOS-III  
A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism for a series of benchmarks.  ...  Results of these simulations in the presence of various compiler optimizations are presented. The average degree of superpipelining metric is introduced.  ...  In order, these are pipeline scheduling, intra-block optimizations, global optimizations, and global register allocation, In this comparison we used 16 registers for expression temporaries and 26 for global  ... 
doi:10.1145/70082.68207 dblp:conf/asplos/JouppiW89 fatcat:vqttpr2d75c4bpgsjij2w5r33a

Efficient superscalar performance through boosting

Michael D. Smith, Mark Horowitz, Monica S. Lam
1992 Proceedings of the fifth international conference on Architectural support for programming languages and operating systems - ASPLOS-V  
We have incorporated boosting into a trace-based, global scheduling algorithm that exploits ILP without adversely affecting the instruction count of a program.  ...  Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications.  ...  Phil Lacroute helped out on our software systems, and because of Phil, we were able to have a chance at debugging our globally-scheduled programs.  ... 
doi:10.1145/143365.143534 dblp:conf/asplos/SmithHL92 fatcat:4xwoogrisfeb5gyi7eagh2j5hq

Extending list scheduling to consider execution frequency

M.J. Bourke, P.H. Sweany, S.J. Beaty
1996 Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences  
This is useful for global instruction scheduling methods that schedule groups of basic blocks, called meta-blocks, as though they were a single block.  ...  1], a meta-block global scheduling algorithm.  ...  Rocket's global optimization includes common subexpression elimination, copy propagation, constant folding, constant propagation, algebraic simplification, induction variable simplification, and reduction  ... 
doi:10.1109/hicss.1996.495463 dblp:conf/hicss/BourkeSB96 fatcat:rvqhi7knezerrb2baifwcqoopq

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 ACM SIGOPS Operating Systems Review  
SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware).  ...  from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33%  ...  Some optimizations were implemented by Pedro Artigas. We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1037949.1024396 fatcat:gycsxj3ebfhazpstc2dbx6ebiq

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI  
SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware).  ...  from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33%  ...  Some optimizations were implemented by Pedro Artigas. We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1024393.1024396 dblp:conf/asplos/BudiuVCG04 fatcat:ncnfj5flsrakpax7vhaf5io3ja

An Advanced Compiler Designed for a VLIW DSP for Sensors-Based Systems

Xu Yang, Hu He
2012 Sensors  
In this paper, we present an advanced compiler designed for a VLIW DSP named Magnolia, which will be used in sensor-based systems. This compiler is based on the Open64 compiler.  ...  We have implemented several advanced optimization techniques in the compiler, and fulfilled the O3 level optimization. Benchmarks from the DSPstone test suite are used to verify the compiler.  ...  Blue bar showed the performance (measured by the number of execution cycles) generated by the compiler on optimization level O0, without any optimization and instruction scheduling.  ... 
doi:10.3390/s120404466 pmid:22666040 pmcid:PMC3355421 fatcat:isbbzi3osfcgfo3cyypsmmvhkm

An evaluation of the TRIPS computer system

Mark Gebhart, James Burrill, Stephen W. Keckler, Doug Burger, Kathryn S. McKinley, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili (+1 others)
2009 SIGPLAN notices  
On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3.  ...  The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency  ...  personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first  ... 
doi:10.1145/1508284.1508246 fatcat:7clzsmjz2fdnlh4wl35ky7htp4

An evaluation of the TRIPS computer system

Mark Gebhart, James Burrill, Stephen W. Keckler, Doug Burger, Kathryn S. McKinley, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili (+1 others)
2009 Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09  
On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3.  ...  The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency  ...  personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first  ... 
doi:10.1145/1508244.1508246 dblp:conf/asplos/GebhartMCDGMRRSBKBM09 fatcat:uuusf5bhgrerrivynks47lbv5m

An evaluation of the TRIPS computer system

Mark Gebhart, James Burrill, Stephen W. Keckler, Doug Burger, Kathryn S. McKinley, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili (+1 others)
2009 SIGARCH Computer Architecture News  
On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3.  ...  The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency  ...  personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first  ... 
doi:10.1145/2528521.1508246 fatcat:bpp2rz4egnhpff35wvsqld6yqi

Global instruction scheduling for superscalar machines

David Bernstein, Michael Rodeh
1991 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation - PLDI '91  
To improve the utilization of machine resources in superscalar processors, the instructions have to be carefully scheduled by the compiler.  ...  A scheme for global (intra-loop) scheduling is proposed, which uses the control and data dependence information summarized in a Program Dependence Graph, to move instructions well beyond basic block boundaries  ...  We suggest combining the PDG with the parametric description of a family of superscalar machines, thereby providing a powerful framework for global instruction scheduling by optimizing compilers  ... 
doi:10.1145/113445.113466 dblp:conf/pldi/BernsteinR91 fatcat:obxxbwsovncyjcg7zx2bgaz6nu

The GEM Optimizing Compiler System

David S. Blickstein, Peter W. Craig, Caroline S. Davidson, R. Neil Faiman Jr., Kent D. Glossop, Richard B. Grove, Steven O. Hobbs, William B. Noyce
1992 Digital technical journal of Digital Equipment Corporation  
The GEM compiler system is the technology Digital is using to build state-of-the-art compiler products for a variety of languages and hardware /software platforms.  ...  The GEM system supports a range of languages and has been successfully retargeted and rehosted for the Alpha AXP and MIPS architectures and for several operating environments.  ...  Inlining has additional benefits on superscalar RISC architectures, like the Alpha AXP system, because the optimization allows the compiler to schedule the instructions of the two routines together.  ... 
dblp:journals/dtj/BlicksteinCDFGGHN92 fatcat:6fcyhhtiljf3fntje5jyifzpkm

Available instruction-level parallelism for superscalar and superpipelined machines

N. P. Jouppi, D. W. Wall
1989 SIGARCH Computer Architecture News  
A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism for a series of benchmarks.  ...  Results of these simulations in the presence of various compiler optimizations are presented. The average degree of superpipelining metric is introduced.  ...  In order, these are pipeline scheduling, intra-block optimizations, global optimizations, and global register allocation, In this comparison we used 16 registers for expression temporaries and 26 for global  ... 
doi:10.1145/68182.68207 fatcat:4pxfk6etaveuhm7ricse5p67ze

Dataflow: A Complement to Superscalar

M. Budiu, P.V. Artigas, S.C. Goldstein
2005 IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.  
There has been a resurgence of interest in dataflow architectures, because of their potential for exploiting parallelism with low overhead.  ...  We compare a program-specific dataflow machine with unlimited parallelism to a superscalar processor running the same program.  ...  The superscalar does well on this loop without compiler unrolling, because it performs dynamic unrolling: it unrolls the loop at run-time inside the instruction window, as guided by the branch predictor  ... 
doi:10.1109/ispass.2005.1430572 dblp:conf/ispass/BudiuAG05 fatcat:ykp2sh7ffbbgxou4kdt7worsgy
« Previous Showing results 1 — 15 out of 1,185 results