Filters








1,666 Hits in 4.2 sec

Register allocation with instruction scheduling for VLIW-architectures

D. S. Ivanov
2010 Programming and computer software  
Interaction between the phases of register allocation and instruction scheduling are often consid ered in publications devoted to optimizations for the final stage of compilation.  ...  However, their inte gration can essentially reduce the time of operation and enhance the performance of the resulting code.  ...  and the spill/load instructions.  ... 
doi:10.1134/s0361768810060058 fatcat:qldzypnqozf4dhrgnkkp3du7aq

Page 6045 of Mathematical Reviews Vol. , Issue 2002H [page]

2002 Mathematical Reviews  
binary trees with spills and pipelined loads.  ...  at a time, under the restrictions that the dependence graph is a full binary tree, all arithmetic and store operations have unit latency, and all load operations have a latency of | or all load operations  ... 

Compiling for stream processing

Abhishek Das, William J. Dally, Peter Mattson
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage.  ...  Our compiler uses information about the program structure and estimates of kernel and memory operation execution times to overlap kernel execution with memory transfers, maximizing performance, and to  ...  Since the memory load mem could execute in parallel with the kernel op1, buffers for c and d have been extended with shadows.  ... 
doi:10.1145/1152154.1152164 dblp:conf/IEEEpact/DasDM06 fatcat:6zzxk5sffbdprhtoa3iw6vfnjq

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Arpith Jacob, Ravi Nair, Tong Chen, Zehra Sura, Changhoan Kim, Carlo Bertolli, Samuel Antao, Kevin OBrien
2015 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
The Active Memory Cube (AMC) is a novel nearmemory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner.  ...  with the architecture.  ...  This diminished the possibility of scheduling binary arithmetic and memory instructions in parallel.  ... 
doi:10.1109/sbac-pad.2015.18 dblp:conf/sbac-pad/JacobNCSKBAO15 fatcat:2tsunsbztzbdhjuxn5zobup3wi

Synergistic Processing in Cell's Multicore Architecture

M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, T. Yamazaki
2006 IEEE Micro  
Acknowledgments We thank Jim Kahle, Ted Maeurer, Jaime Moreno, and Alexandre Eichenberger for their many comments and suggestions in the preparation of this work.  ...  We also thank Valentina Salapura for her help and numerous suggestions in the preparation of this article.  ...  Figure 4b shows that SIMD data-parallel operations cannot readily be used for operations on scalar elements with arbitrary alignment loaded into a vector register using the quadword load operations.  ... 
doi:10.1109/mm.2006.41 fatcat:tt5nh6bppzdnxh6rhwfdcq7gle

An Efficient Code Generation Algorithm for Non-orthogonal DSP Architecture

Yi-Hsuan Lee, Cheng Chen
2007 Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology  
In this paper, we present an effective code generation algorithm named Rotation Scheduling with Spill Codes Predicting (RSSP) to maximally exploit the benefits of non-orthogonal architectures.  ...  Furthermore, we also present some preliminary ideas to generalize RSSP, which can make it more practicable and suit various DSPs with similar architectural features.  ...  Then, in the code compaction phase, these spill codes can be scheduled in parallel with other operations, this can decrease the spill costs.  ... 
doi:10.1007/s11265-007-0053-x fatcat:gorrmf5ue5fpblaub2vdi2rb7i

An Advanced Compiler Designed for a VLIW DSP for Sensors-Based Systems

Xu Yang, Hu He
2012 Sensors  
The VLIW architecture can be exploited to greatly enhance instruction level parallelism, thus it can provide computation power and energy efficiency advantages, which satisfies the requirements of future  ...  We have implemented several advanced optimization techniques in the compiler, and fulfilled the O3 level optimization. Benchmarks from the DSPstone test suite are used to verify the compiler.  ...  Also, the Magnolia compiler supports several addressing modes for load and store operations, which is quite useful in the DSP domain.  ... 
doi:10.3390/s120404466 pmid:22666040 pmcid:PMC3355421 fatcat:isbbzi3osfcgfo3cyypsmmvhkm

XDSPCORE: a compiler-based configurable digital signal processor

A. Krall, I. Pryanishnikov, U. Hirnschrott, C. Panis
2004 IEEE Micro  
Traditionally, software developers have programmed DSPs in assembly language for efficiency. This implies time-consuming programming, extensive debugging, and little or no code portability.  ...  These constraints, along with a relatively narrow application domain, have led designers to create special architectural features, as found in the Harvard architecture, VLIW (very long instruction word  ...  Acknowledgments This work is supported by Infineon Technologies Austria, the European Commission under the project SoCMobinet (IST-2000-30094) and the Christian Doppler Gesellschaft.  ... 
doi:10.1109/mm.2004.40 fatcat:lclck3wx2zd4zkilqn2oanqpdq

Data-aware process networks

Christophe Alias, Alexandru Plesco
2021 Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction  
In particular, novel parallelization algorithms and intermediate representations are required.  ...  volume, parallelization degree).  ...  The tool allows for custom arithmetic operator generation based on target, required frequency and arithmetic precision.  ... 
doi:10.1145/3446804.3446847 fatcat:pyhil53nuzg2hk2dc7pbj7zh6q

Exploiting parallel microprocessor microarchitectures with a compiler code generator

W. W. Hwu, P. P. Chang
1988 SIGARCH Computer Architecture News  
With advances in VLSI technology, microprocessor designers can provide more microarchitectural parallelism to increase performance.  ...  The experiments reported in this paper address two important issues: the effects of these forms and the appropriate balance among them.  ...  Acknowledgements The authors would like to acknowledge Nancy Warter, Sharon Simonson, Sadun Anik. and the other members of the Computer System Group for their invaluable comments and suggestions.  ... 
doi:10.1145/633625.52406 fatcat:tlg7u7micvgohpktjmkkwpqk4y

Page 7318 of Mathematical Reviews Vol. , Issue 90M [page]

1990 Mathematical Reviews  
90m:68019 68 90m:68019 68N20 68Q25 Bernstein, David (Bernstein, David Josef] (1-IBM); Jaffe, Jeffrey M. (1-IBM); Rodeh, Michael (IL-IBM) Scheduiing arithmetic and load operations in parallel with no spilling  ...  Comput. 18 (1989), no. 6, 1098-1127. Summary: “A machine model in which load operations can be performed in parallel with arithmetic operations by two separate functional units is considered.  ... 

Embedded software in real-time signal processing systems: design technologies

G. Goossens, J. Van Praet, D. Lanneer, W. Geurts, A. Kifli, C. Liem, P.G. Paulin
1997 Proceedings of the IEEE  
A companion paper in this issue [1] presents a survey of application and architecture trends for embedded systems in these growth markets.  ...  The increasing use of embedded software, often implemented on a core processor in a single-chip system, is a clear trend in the telecommunications, multimedia, and consumer electronics industries.  ...  Communication between memories and registers requires separate "load" and "store" operations, which may be scheduled in parallel with arithmetic operations if permitted by the instruction set.  ... 
doi:10.1109/5.558718 fatcat:jtn2aeo4ybcwfgc67rdsdqjhei

Cost-conscious strategies to increase performance of numerical programs on aggressive VLIW architectures

D. Lopez, J. Llosa, M. Valero, E. Ayguade
2001 IEEE transactions on computers  
To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units.  ...  Also, we confirm that multiply-add fused units will have a significant impact in raising the performance of future processors architectures with a reasonable increase in cost.  ...  ACKNOWLEDGMENTS This work has been supported by the Ministry of Culture and Education of Spain under contract TIC 98-0511 and by CEPBA (European Centre for Parallelism of Barcelona).  ... 
doi:10.1109/12.956090 fatcat:mnknjyvb3fexbir3rx3xrqzc7q

OS and compiler considerations in the design of the IA-64 architecture

Rumi Zahir, Jonathan Ross, Dale Morris, Drew Hess
2000 ACM SIGOPS Operating Systems Review  
Traditional RISC architectures use hardware approaches to obtain more instruction-level parallelism, with the compiler and the operating system (OS) having only indirect visibility into the mechanisms  ...  The IA-64 architecture [14] was specifically designed to enable systems which create and exploit high levels of instructionlevel parallelism by explicitly encoding a program's parallelism in the instruction  ...  , Shashikant Rao, and Carol Thompson.  ... 
doi:10.1145/384264.379242 fatcat:iprphesjo5bfxpp2zmvkfyjkfq

OS and compiler considerations in the design of the IA-64 architecture

Rumi Zahir, Jonathan Ross, Dale Morris, Drew Hess
2000 Proceedings of the ninth international conference on Architectural support for programming languages and operating systems - ASPLOS-IX  
Traditional RISC architectures use hardware approaches to obtain more instruction-level parallelism, with the compiler and the operating system (OS) having only indirect visibility into the mechanisms  ...  The IA-64 architecture [14] was specifically designed to enable systems which create and exploit high levels of instructionlevel parallelism by explicitly encoding a program's parallelism in the instruction  ...  , Shashikant Rao, and Carol Thompson.  ... 
doi:10.1145/378993.379242 fatcat:pli4llzbivbk3h4qy2h676a74m
« Previous Showing results 1 — 15 out of 1,666 results