Filters








627 Hits in 2.7 sec

Code generation schema for modulo scheduled loops

B. Ramakrishna Rau, Michael S. Schlansker, P. P. Tirumalai
1992 ACM SIGMICRO Newsletter  
Modulo scheduling is one approach for generating such schedules.  ...  This naner addresses an issue which has received little attentibn' thus far, but which is non-trivial in its complexity: the task of generating correct, high-performance code once the modulo schedule has  ...  Code Generation Schemas for Modulo Scheduled Loops When generating code for modulo schedules, two fundamental problems must be overcome.  ... 
doi:10.1145/144965.145795 fatcat:45zdocdtkfh3nnm7gbbu5lte3y

Code Generation Schema For Modulo Scheduled Loops

B.R. Rau, M.S. Schlansker, P.P. Tirumalai
[1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25  
Modulo scheduling is one approach for generating such schedules.  ...  This naner addresses an issue which has received little attentibn' thus far, but which is non-trivial in its complexity: the task of generating correct, high-performance code once the modulo schedule has  ...  Code Generation Schemas for Modulo Scheduled Loops When generating code for modulo schedules, two fundamental problems must be overcome.  ... 
doi:10.1109/micro.1992.697012 dblp:conf/micro/RauST92 fatcat:zsobn2inlbcaffalrsgbsua6vi

Register allocation for software pipelined loops

B. R. Rau, M. Lee, P. P. Tirumalai, M. S. Schlansker
1992 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92  
Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of looPs and executing them in parallel.  ...  This parser studies the task of register allocation for-software pipeji;ed loops, both with and without hardware features that are specifically aimed at supporting software pipelines.  ...  fit allocation algorithm and the adjacency ordering heuristic, described in this paper, were conceived of by Ross Towle who, along with Warren Ristow and Jim Dehnert, implemented register allocation for  ... 
doi:10.1145/143095.143141 dblp:conf/pldi/RauLTS92 fatcat:nvcwplvcnbbgtd7tzi5gjsdfau

ShiftQ

Shail Aditya, Michael S. Schlansker
2001 Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems - CASES '01  
This paper describes the ShiftQ schema and a method to automatically synthesize them from modulo-scheduled loops.  ...  ShiftQs are hardware structures consisting of registers and switches which bu er and transport operands among function units within custom hardware loop accelerators.  ...  Loop initialization and loop nalization pose special requirements which do not conform to a steady state modulo schedule.  ... 
doi:10.1145/502217.502243 dblp:conf/cases/AdityaS01 fatcat:kd3cvfgiu5hi3fbttix6io4tty

ShiftQ

Shail Aditya, Michael S. Schlansker
2001 Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems - CASES '01  
This paper describes the ShiftQ schema and a method to automatically synthesize them from modulo-scheduled loops.  ...  ShiftQs are hardware structures consisting of registers and switches which bu er and transport operands among function units within custom hardware loop accelerators.  ...  Loop initialization and loop nalization pose special requirements which do not conform to a steady state modulo schedule.  ... 
doi:10.1145/502239.502243 fatcat:pvptlej2ineqraxhx5dydqhjxm

A register file and scheduling model for application specific processor synthesis

E. Ercanli, C. Papachristou
1996 Proceedings of the 33rd annual conference on Design automation conference - DAC '96  
with recurrences, a VLIW type of coprocessor is synthesized and realized, and an accompanying parallel code is generated.  ...  We introduce a novel register file model, Shifting Register File (SRF), based on cyclic regularity of register file accesses; and a simple method, Expansion Scheduling, for scheduling iterative computations  ...  Memory Organization Schema In loops of scientific code, most indirect references are made to array elements.  ... 
doi:10.1145/240518.240525 dblp:conf/dac/ErcanliP96 fatcat:ktlx4z72jzc6foazsp2lhessmm

Bridging the computation gap between programmable processors and hardwired accelerators

Kevin Fan, Manjunath Kudlur, Ganesh Dasika, Scott Mahlke
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
A customized instance of the loop accelerator architecture is generated for a particular loop and then the data and control paths are proactively generalized in an efficient manner to increase flexibility  ...  to execute multiple similar loops.  ...  We also thank the anonymous referees for their excellent comments.  ... 
doi:10.1109/hpca.2009.4798266 dblp:conf/hpca/FanKDM09 fatcat:j6fturcuangelcvqx6iwgbb2ve

Streamroller:

Manjunath Kudlur, Kevin Fan, Scott Mahlke
2006 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis - CODES+ISSS '06  
A compiler-based system automatically synthesizes loop accelerators for individual kernels at varying performance levels.  ...  The synthesis of the accelerator pipeline involves designing loop accelerators for individual kernels, instantiating buffers for arrays used in the application, and hooking up these building blocks to  ...  The innermost loops of the kernel specification function are modulo scheduled, and the architecture for the LAs is derived directly from the schedule.  ... 
doi:10.1145/1176254.1176321 dblp:conf/codes/KudlurFM06 fatcat:vmp5unqqevcurkdjdkvfxwqeui

Increasing hardware efficiency with multifunction loop accelerators

Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke
2006 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis - CODES+ISSS '06  
A compiler-based system for automatically synthesizing multifunction loop accelerator architectures from C code is presented.  ...  These loop accelerators are traditionally designed in a single-function manner, wherein each loop nest is implemented as a dedicated hardware block.  ...  Figure 1 :Figure 2 : 12 Loop Hardware schema for loop accelerator. Figure 3 : 3 (a) A portion of the modulo schedule for sobel, and (b) the corresponding datapath.  ... 
doi:10.1145/1176254.1176322 dblp:conf/codes/FanKPM06 fatcat:25hd7rlerbh4llttrw4eyutvlq

Modulo scheduling for highly customized datapaths to increase hardware reusability

Kevin Fan, Hyun hul Park, Manjunath Kudlur, S ott Mahlke
2008 Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization - CGO '08  
This paper proposes a constraint-driven modulo scheduler that maps softwarepipelineable loops onto programmable loop accelerator hardware.  ...  to the original loop for which the hardware was designed.  ...  ACKNOWLEDGMENTS We thank Yuanyuan Tian for her help with quantifying graph similarity, as well as the anonymous referees who provided excellent feedback.  ... 
doi:10.1145/1356058.1356075 dblp:conf/cgo/FanPKM08 fatcat:3julctovlzfktfhjraym5djny4

Loop Optimization With Tradeoff Between Cycle Count And Code Size For Dsp Applications

Bogong Su, Jian Wang, Rafi Rabipour, Erh-Wen Hu, Joseph Manzano
2004 Zenodo  
[6] combines scheduling heuristics, postlude collapsing schemas and speculative modulo scheduling, and again realizes a code size reduction of 30% on average with larger benchmark programs.  ...  (op1,j) + d(e) <= ls(op2,j+d(e)); Definition 3. 3 3 For a given loop, the performance of a valid loop schedule is measured by the initiation interval and the code size of the scheduled loop.  ...  From the definition of pd and Theorem 4.1, it is clear that we can change the code size of software-pipelined loop by choosing different value of initiation interval II.  ... 
doi:10.5281/zenodo.38525 fatcat:gtkd2rhg6zbnblesdtmfrem2ty

Hierarchical Cluster Assignment for Coarse-Grain Reconfigurable Coprocessors

Martino Sykora, Davide Pavoni, Joel Cambonie, Roberto Costa, Stefano Crespi Reghizzi
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
Both RCP and DSPFabric are designed for supporting Kernel Only Modulo Scheduled [21] loops.  ...  Since DSPFabric has been specifically designed as loop accelerator Co-processor, each cluster is equipped with hardware features for better executing modulo scheduled code [20] , like support for instruction  ... 
doi:10.1109/ipdps.2007.370381 dblp:conf/ipps/SykoraPCCC07 fatcat:l5bhze6gqjdy3k42ibe5kymflq

AceMesh: a structured data driven programming language for high performance computing

Li Chen, Shenglin Tang, You Fu, Xiran Gao, Jie Guo, Shangzhi Jiang
2020 CCF Transactions on High Performance Computing  
Its language features include data-centric parallelizing template, aggregated task dependence for parallel loops.  ...  These features not only relieve the programmer from tedious refactoring details but also provide possibility for structured execution of complex task graphs, data locality exploitation upon data tile templates  ...  For the do directive in Fig. 1 , its task generating codes are shown in Fig. 4 .  ... 
doi:10.1007/s42514-020-00047-4 fatcat:5d6q663fuffr7fma3kqrmjffl4

A Dataflow Programming Language and its Compiler for Streaming Systems

Haitao Wei, Stéphane Zuckerman, Xiaoming Li, Guang R. Gao
2014 Procedia Computer Science  
We also propose a compiler framework for COStream on general-purpose multi-core architectures. It features an inter-thread software pipelining scheduler to exploit the parallelism among the cores.  ...  The dataflow programming paradigm shows an important way to improve programming productivity for streaming systems.  ...  Code Generation In the backend, the compiler generates the code in three part for the COStream program: the computation node, the buffer allocation and the software pipelining codes.  ... 
doi:10.1016/j.procs.2014.05.116 fatcat:nqxtxn2mnzdj7j33pvmxgf7tui

Field-testing IMPACT EPIC research results in Itanium 2

John W. Sias, Sain-zee Ueng, Geoff A. Kent, Ian M. Steiner, Erik M. Nystrom, Wen-mei W. Hwu
2004 SIGARCH Computer Architecture News  
in situ evaluation of code generated using aggressive, EPIC-enabled techniques in a reality-constrained microarchitecture.  ...  Using the Intel Itanium 2 microprocessor, the SPECint2000 benchmarks and the IMPACT Compiler for IA-64, a research compiler competitive with the best commercial compilers on the platform, we provide an  ...  Acknowledgments The authors extend their thanks to Intel and Hewlett-Packard for their generous support of this work, including software and hardware contributions; to Stephane Eranian (HP Labs) and Allan  ... 
doi:10.1145/1028176.1006735 fatcat:4b2mc4bssbhnjmpkysmbu2z4w4
« Previous Showing results 1 — 15 out of 627 results